Washington D.C. is a frequent travel destination of many for both business and vacation. This report is intended to give insights on the Airbnb listings for the area in regards to cost and rating by location, host, and individual listing. Listing prices also tend to fluctuation based on the day of the week or time of year, so time variables will also be included.
When planning a stay at an Airbnb listing, customers want to ensure their choice is best option. The purpose of this project is to help answer consumer questions such as:
This interactive report will help narrow your choices for staying in Washington, D.C. At the bottom of this report you can even choose your favorite one and the Airbnb listing URL will open in a new browser tab!
path = "/Users/Nolan/DS736/Python_datafiles/"
filename1 = "listings.csv"
filename2 = "calendar.csv"
df_listings = pd.read_csv(path+filename1)
df_calendar = pd.read_csv(path+filename2)There were two datasets used for this project:
A listings dataset of the 7892 listings in Washington, D.C. with 74 columns per row detailing numerous aspects of the listing rating, host, availability, location, price, and physical details.
A calendar dataset of 2.8 Million rows of booking availability and prices for each listing of each day from 12/15/21 - 12/14/22
The Airbnb datasets can be found at: http://insideairbnb.com/get-the-data.html
If you are interested, you can view the dataset variables with the tabs above!
for i in df_listings.columns:
print(i, end= '. ')## id. listing_url. scrape_id. last_scraped. name. description. neighborhood_overview. picture_url. host_id. host_url. host_name. host_since. host_location. host_about. host_response_time. host_response_rate. host_acceptance_rate. host_is_superhost. host_thumbnail_url. host_picture_url. host_neighbourhood. host_listings_count. host_total_listings_count. host_verifications. host_has_profile_pic. host_identity_verified. neighbourhood. neighbourhood_cleansed. neighbourhood_group_cleansed. latitude. longitude. property_type. room_type. accommodates. bathrooms. bathrooms_text. bedrooms. beds. amenities. price. minimum_nights. maximum_nights. minimum_minimum_nights. maximum_minimum_nights. minimum_maximum_nights. maximum_maximum_nights. minimum_nights_avg_ntm. maximum_nights_avg_ntm. calendar_updated. has_availability. availability_30. availability_60. availability_90. availability_365. calendar_last_scraped. number_of_reviews. number_of_reviews_ltm. number_of_reviews_l30d. first_review. last_review. review_scores_rating. review_scores_accuracy. review_scores_cleanliness. review_scores_checkin. review_scores_communication. review_scores_location. review_scores_value. license. instant_bookable. calculated_host_listings_count. calculated_host_listings_count_entire_homes. calculated_host_listings_count_private_rooms. calculated_host_listings_count_shared_rooms. reviews_per_month.
for i in df_calendar.columns:
print(i, end= '. ')## listing_id. date. available. price. adjusted_price. minimum_nights. maximum_nights.
This is a choropleth map of the listings for each neighborhood in the Washington, D.C. area. The density of listings within each neighborhood is represented by the intensity of color or heat. If you scroll over the areas, the tooltip will show you the name of the neighborhood and details about the area by count, price, and rating.
The heat of the neighborhoods is most intense in the central areas surrounding the attractions of D.C. like the monuments, zoo, museums, and the national mall.
Travel around D.C. is fairly easy due to the widely available uber presence, bikesharing, and the metro. The adjacent neighborhoods may have higher ratings and lower costs, so planning a stay with a bit of distance could be a better option.
#TOP TEN REVIEWS
top_ = df_listings.sort_values(by='number_of_reviews', ascending=False)
top_ten_df=top_[:10]
top_ten_df.reset_index(inplace=True, drop=True)
#PRICE DF
n_price_df = df_listings[['id','host_id','host_name','host_total_listings_count','neighbourhood_cleansed','price','review_scores_rating', 'number_of_reviews', 'room_type']]
n_price_df['price']=n_price_df['price'].str.replace('[$,]','', regex = True)n_price_df['price']=n_price_df['price'].astype(float)
#x DF
x = n_price_df.groupby(['neighbourhood_cleansed']).agg({'neighbourhood_cleansed':['count'],
'price':['mean', 'min','max'],
'review_scores_rating': ['mean']}).reset_index()
x.columns = ['neighborhood','count','AverPrice','MinPrice','MaxPrice', 'AverRating']
# MAP
bnb_geo = path + 'neighbourhoods.geojson'
air_geo=gpd.read_file(bnb_geo)
geo_df = air_geo.merge(x, left_on='neighbourhood', right_on='neighborhood', how='outer')
geo_df['AverPrice'] =round(geo_df['AverPrice'],2)
geo_df['AverRating']=round(geo_df['AverRating'],2)
geo_df['AverPrice'] = '$'+geo_df['AverPrice'].astype(str)center_of_map = [38.9072,-77.0369]
my_map = folium.Map(location = center_of_map,
zoom_start = 12,
tiles = 'cartodbpositron',
width = '90%',
height = '100%',
left = '5%',
right = '5%',
top = '0%',
no_touch=True)
count_map = folium.Choropleth(geo_data = bnb_geo,
name = 'Count',
data=geo_df,
columns = ['neighbourhood', 'count'],
key_on = 'feature.properties.neighbourhood',
fill_color = 'PuBu',
fill_opacity=.7,
highlight = True,
overlay= True,
show=True,
legend_name='Count of Listings'
).add_to(my_map)
folium.features.GeoJson(
data=geo_df,
tooltip=folium.features.GeoJsonTooltip(
fields=['neighbourhood',
'count',
'AverPrice',
'AverRating'],
aliases=['Neighborhood: ',
'Listings Count: ',
'Average Listing Price: ',
'Average Listing Rating: '],
localize=True,
style = (
'background-color:white; color:midnightblue')
),
style_function = lambda x:{
'color':'black',
'fillColor':'transparent',
'weight':0.5},
highlight_function = lambda x:{
'weight':3,
'color': 'black'
},
overlay=True,
control=False
).add_to(my_map)my_map.save(path+'Chloropleth_DC.html')This section dives deeper into the ratings, cost, and listings of each neighborhood. For those interested in staying in D.C. without a specific area in mind, these charts can help determine a destination area.
The graph below is a horizontal barchart depicting each neighborhood’s average listing cost per night. The colors represent the neighborhoods average rating: green for above average rating, gray for about average, and red for below average rating.
There is a slight trend to have a better rating for the lower costing neighborhoods and a lower rating with the higher costing neighborhoods. This is potentially due to “value” being a factor in the rating scores.
Notice the vertical dashed line for price comparison to D.C. average: $189.69.
#FUNCTION
def pick_colors_compared_to_total_rating(this_data):
colors = []
avg = df_listings.review_scores_rating.mean()
for each in this_data.AverRating:
if each > avg*1.01:
colors.append('lime')
elif each < avg*0.99:
colors.append('tab:red')
else:
colors.append('grey')
return colors
## PLOT 1
x = x.sort_values('AverRating', ascending=True)
my_colors = pick_colors_compared_to_total_rating(x)
Above = mpatches.Patch(color ='lime', label='Above Average')
At = mpatches.Patch(color ='grey', label='Within 1% of the Average')
Below = mpatches.Patch(color ='tab:red', label='Below Average')
ax1=plt.figure(figsize = (20,12))
plt.barh(x.neighborhood,
x.AverPrice,
color = my_colors,
edgecolor='lightsteelblue',
linewidth = 2)plt.xlabel('Cost in Dollars', fontsize=14)
plt.axvline(n_price_df.price.mean(),
linestyle= 'dashed',
color= 'black')
plt.text(n_price_df.price.mean()+1,
len(pd.unique(x.neighborhood)),
'Mean Cost of Airbnb in Washington, D.C. = $'+ str(round((n_price_df.price.mean()),2)),
rotation=0,
fontsize=10)
plt.title('Barplot: Cost of Airbnb by Neighborhood \n Grouped by Rating', fontsize=20, loc='left')
plt.legend(handles=[Above, At, Below],
fontsize=14,
title="Average Rating of Airbnb in Neighborhood",
frameon=True,
shadow=True)
plt.tight_layout()
plt.show()The scatterplot below shows each listing in the neighborhood by rating, room type, and price. For reasonable comparison, listings must have at least 10 reviews to be included and must be listed for under $2,500 per night.
Each bubble represents a listing, the size represents the price, and the color represents the room type of entire home/apt, private room, shared room, or hotel room. The listings included are mainly entire homes or apartments, but there are a good amount of private rooms available as well. To better understand the distribution of listings, a table is included below the graph.
# X2 DF
x2 = n_price_df.copy()
x2=x2[(x2.number_of_reviews >=10) & (x2.price <2500)]
# PLOT 2
colors = {'Entire home/apt':'slateblue','Private room':'lightcoral', 'Shared room':'red', 'Hotel room':'black'}
Home = mpatches.Patch(color ='slateblue', label='Entire home/apt')
Private = mpatches.Patch(color ='lightcoral', label='Private room')
Shared = mpatches.Patch(color ='red', label='Shared room')
Hotel = mpatches.Patch(color ='black', label='Hotel room')
fig = plt.figure(figsize=(18,10))
ax2 = plt.scatter(x2['review_scores_rating'],
x2['neighbourhood_cleansed'],
c=x2['room_type'].map(colors),
s=x2['price'], edgecolor= 'black')
plt.axvline(x2['review_scores_rating'].mean(), linestyle= 'dashed', color= 'black')
plt.text(x2.review_scores_rating.mean()+0.01,
len(pd.unique(x2['neighbourhood_cleansed'])),
'Mean = '+ str(round((x2.review_scores_rating.mean()),2)),
rotation=0,
fontsize=10)
plt.legend(handles=[Home, Private, Shared, Hotel],
frameon=True,
shadow=True)
plt.title('Scatterplot: Ratings for Airbnbs by Neighborhood \n(excluded: <10 ratings & >$2500/night)', size = 20, loc='left')
plt.tight_layout()
plt.show()count_table=x2.room_type.value_counts()
print(count_table.to_string())## Entire home/apt 2871
## Private room 763
## Shared room 43
## Hotel room 20
Each listing could have a range of price depending on the time of year or day of week. For example a private room in Dupont Circle could be $175 in December, but $250 in July due to seasonal attractions. The same is true for days of the week, Tuesday’s price could be different from Saturday’s price.
This section highlights the occurrences of price trends throughout the year for the entire Washington, D.C. area.
The heat intensifies as the price increases. The average price for listings in the D.C. area ranges from about $190 to $214.
Notably Friday and Saturday are the most expensive days of the week to stay in D.C. While spring-fall are the most expensive months. This could potentially benefit someone needing to make a business trip on a weekday in the winter!
#PRICE DF FROM CALENDAR
price_df=df_calendar.copy()
price_df['price']=price_df['price'].str.replace('[$,]','', regex = True)
price_df['price'] =price_df['price'].str.slice(0,-3,1)
price_df['price'] = price_df['price'].astype(float)
price_df['adjusted_price']=price_df['adjusted_price'].str.replace('[$,]','', regex = True)
price_df['adjusted_price'] =price_df['adjusted_price'].str.slice(0,-3,1)
price_df['adjusted_price'] = price_df['adjusted_price'].astype(float)
price_df['date'] =pd.to_datetime(price_df['date'], format = '%Y/%m/%d')
price_df['WeekDay'] = price_df.date.dt.strftime('%a')
price_df['MonthName'] = price_df.date.dt.strftime('%b')
price_df=price_df.groupby(['MonthName', 'WeekDay']).agg({'price':'mean',
'adjusted_price': 'mean'}).reset_index()
price_df = price_df.groupby(['MonthName', 'WeekDay'])['price'].sum().reset_index(name='Price_Adjust')
price_df = price_df.pivot(index='MonthName', columns = 'WeekDay', values = 'Price_Adjust')
month_order =['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
day_order = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
price_df=price_df.reindex(month_order,columns=day_order)
price_df=price_df.T
# PLOT 3
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)
dollar_fmt = FuncFormatter(lambda x,p: format(float(x),','))
ax = sns.heatmap(price_df,
annot = True,
linewidth=0,
cmap = 'Greens',
fmt = '.2f',
square=False,
annot_kws={'size': 12},
cbar_kws={'extend':'both',
'format':'$% .0f'}
)
plt.title('Average Listing Price by Month and Day \n Heatmap', fontsize=20, pad=15)
plt.xlabel('Month', fontsize=18)
plt.ylabel('Day', fontsize=18)
plt.yticks(rotation=0, size=14)plt.xticks(size=14)cbar = ax.collections[0].colorbar
max_count = round(price_df,0).to_numpy().max()
my_colorbar_ticks = [*range(185, int(max_count)+2,5)]
cbar.set_ticks(my_colorbar_ticks)
cbar.set_label('Listing Price ($)', rotation=270, fontsize=16, labelpad=20)
plt.tight_layout()
plt.show()The graph below is a bump chart ranking the months on their price based on the day of the week. The individual points show the average cost for the month day.
The colder months are consistently the cheapest as was seen in the heatmap, but here we can notice that the price during the warmer months can vary much more on the day of the week.
# Rank
price_df=price_df.T
price_df_ranked = price_df.rank(0, ascending=False, method='min')
price_df_ranked=price_df_ranked.T
cmap = plt.get_cmap('tab20c')
month_num=list(range(1,17))
del month_num[3::4]
colors=cmap(month_num)
price_df=price_df.T
fig=plt.figure(figsize = (18,10))
ax = fig.add_subplot(1,1,1)
price_df_ranked.plot(kind='line',
ax=ax,
marker='o',
color=colors,
markeredgewidth =2,
linewidth=6,
markersize=44,
markerfacecolor='w'
)
ax.invert_yaxis()
num_rows = price_df_ranked.shape[0]
num_cols = price_df_ranked.shape[1]
plt.ylabel('Price Ranking', fontsize=18, labelpad=10)
plt.title('Ranking of Average Cost per Night by Day of Week and Month \n Bump Chart', fontsize=20, pad=15)
plt.yticks(range(1, len(price_df.columns)+1,1), fontsize=14)plt.xticks(fontsize=14)plt.xlabel('Day of Week', fontsize=18)
ax.legend(bbox_to_anchor=(1.01,1.01),
markerscale=0.4,
fontsize=14,
labelspacing=1,
borderpad=1)
i=0
j=0
for eachcol in price_df.columns:
for eachrow in price_df.index:
this_rank = price_df_ranked.iloc[i,j]
ax.text(i, this_rank, '$' + str(round(price_df.iloc[i,j],2)), ha='center', va='center', fontsize=10)
i+=1
j+=1
i=0
plt.show()# MAP
bnb_geo = path + 'neighbourhoods.geojson'
air_geo=gpd.read_file(bnb_geo)
geo_df = air_geo.merge(x, left_on='neighbourhood', right_on='neighborhood', how='outer')
geo_df['AverPrice'] =round(geo_df['AverPrice'],2)
geo_df['AverRating']=round(geo_df['AverRating'],2)
geo_df['AverPrice'] = '$'+geo_df['AverPrice'].astype(str)Unlike a known hotel chain, staying at an Airbnb can potentially seem like an unpredictable night. It is both an appeal and a deterrent of booking a night with a niche experience. This section aims to highlight the listings with the most reviews and the hosts with the most listings to look into listings with high quantity attributes.
Listings with large amounts of reviews can be highly accurate on their rating, while hosts with large amounts of listings can be very consistent with the quality of management. The following charts can be useful to those who value reliability.
Below contains a dual bar chart of the 10 hosts with the most listings in the area and the average cost of their listings. Blue columns represent amount of listings, and orange for average price.
Notably, only Carolyn and Svetlana are actual individual people. The listings found here are mainly owned and operated by businesses, which provides an interesting middle ground between traditional Airbnb listings and standard hotel chains.
In the next section, you will be able to find these listings to see more details.
host_df = n_price_df.groupby(['host_id', 'host_name','host_total_listings_count'])['price'].mean().reset_index(name='AverPrice')
ten_host_df=host_df.sort_values('host_total_listings_count', ascending=False)[0:10]
def autolabel(bars, ax, place_of_decimals, symbol):
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x()+bar.get_width()/2,
height,
symbol+format(height, place_of_decimals),
ha='center',
va='bottom')
fig = plt.figure(figsize=(20,12))
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
bar_width=0.4
x_pos = np.arange(10)
count_bars= ax1.bar(x_pos-(0.5*bar_width),
ten_host_df.host_total_listings_count,
bar_width,
color='tab:blue',
edgecolor='black',
label = 'Listings Count'
)
price_bars= ax2.bar(x_pos+(0.5*bar_width),
ten_host_df.AverPrice,
bar_width,
color='tab:orange',
edgecolor='black',
label = 'Average Price of Listing'
)
plt.title('Top Ten Hosts with the Most Listings: \n Listing Count and Average Price of Listing', fontsize=20, pad = 15)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(ten_host_df.host_name, fontsize=11)
ax1.set_xlabel('Host Name', fontsize=18)
ax1.set_ylabel('Count of Listings', fontsize=18, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
ax2.set_ylabel('Average Price of Listing ($)',rotation=270, fontsize=18, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
count_color, count_label = ax1.get_legend_handles_labels()
price_color, price_label = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + price_color,
count_label + price_label,
loc = 'upper right',
frameon=True, ncol=1, shadow=True, borderpad=1, fontsize=14)
ax2.set_ylim(0,ten_host_df.AverPrice.max()*1.2)autolabel(price_bars, ax2,'.2f', '$')
autolabel(count_bars, ax1, '.0f', '')
plt.tight_layout()
plt.show()The graph below is for the 10 listings with the most reviews. These are all listings owned by individuals, unlike the previous graph dominated by businesses. Blue columns represent amount of reviews, and orange represent price of the listing.
These listings could be considered reliable in their rating and are all under the average price for D.C.
In the next section, you will be able to find these listings to see more details.
# TOP 10 LIST
review_df=top_ten_df[['id','name','host_id','host_name','price','number_of_reviews','review_scores_rating','review_scores_accuracy',
'review_scores_cleanliness','review_scores_checkin','review_scores_communication','review_scores_location',
'review_scores_value','picture_url']]
review_df['price']=review_df['price'].str.replace('[$,]','', regex = True)review_df['price'] =review_df['price'].str.slice(0,-3,1)
review_df['price'] = review_df['price'].astype(float)
fig = plt.figure(figsize=(20,12))
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
bar_width=0.4
x_pos = np.arange(10)
plt.title('Top Ten Listings with the Most Reviews: \n Review Count and Price of Listing', fontsize=20, pad = 15)
count_bars= ax1.bar(x_pos-(0.5*bar_width),
review_df.number_of_reviews,
bar_width,
color='tab:blue',
edgecolor='black',
label = 'Review Count'
)
price_bars= ax2.bar(x_pos+(0.5*bar_width),
review_df.price,
bar_width,
color='tab:orange',
edgecolor='black',
label = 'Rating'
)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(review_df.host_name, fontsize=11)
ax1.set_xlabel('Host Name', fontsize=18)
ax1.set_ylabel('Count of Reviews', fontsize=18, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
ax2.set_ylabel('Price of Listing',rotation=270, fontsize=18, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
autolabel(price_bars, ax2,'.2f', '$')
autolabel(count_bars, ax1, '.2f', '')
plt.tight_layout()
plt.show()This is an interactive map of the Airbnb listings for the Washington, D.C. area, where you can open a listing directly in your browser! Each point shows a listing with a color to represent a type, the initial points are the top ten groups:
-Top 10 Hosts (Red) -Top 10 Reviews (Blue)
The layer control box will allow you to view a different map type (openstreetmap or cartodbpositron) or view different listing types.
Listings outside the top 10 groups can be viewed by room type:
-Entire home/apartment (Dark Blue) -Private room (Purple) -Hotel room (Yellow) -Shared room (Pink)
If you scroll over the listing, the tooltip will show you listing type, price, and rating. If you would like to book the Airbnb or view the listing, you can click on the point and a link will pop up. To view the listing, click the link and it will open in a new tab.
top_10_most_reviewed_listings=list(review_df['id'])
top_10_hosts=list(ten_host_df['host_id'])
# MAP 2
map_df=df_listings[['id','host_id','latitude','longitude', 'listing_url','price','review_scores_value', 'room_type']]
center_of_map = [38.9072,-77.0369]
my_map2 = folium.Map(location = center_of_map,
zoom_start = 12,
width = '90%',
height = '100%',
left = '5%',
right = '5%',
top = '0%',
no_touch=True)
tiles = ['cartodbpositron','openstreetmap']
for tile in tiles:
folium.TileLayer(tile).add_to(my_map2)fg = folium.FeatureGroup(name='Top 10 Listing Types')
fg2 = folium.FeatureGroup(name='Room Type',
show = False)
fg.add_to(my_map2)fg2.add_to(my_map2)folium.LayerControl(collapsed=False).add_to(my_map2)
for i in range(0, len(map_df)):
listing = map_df.loc[i,'id']
host = map_df.loc[i,'host_id']
room = map_df.loc[i, 'room_type']
if listing in top_10_most_reviewed_listings:
color = 'blue'
label_ = 'Heavily Reviewed'
elif host in top_10_hosts:
color = 'red'
label_ = 'Established Host'
elif room == 'Private room':
color = 'purple'
elif room == 'Hotel room':
color = 'yellow'
elif room == 'Entire home/apt':
color = 'darkblue'
elif room == 'Shared room':
color = 'pink'
try:
if color in ['red','blue']:
folium.Circle(location=[map_df.loc[i,'latitude'],
map_df.loc[i,'longitude']],
tooltip="{}<br>Price: {} <br> Rating: {}<br> Click to Show Link!".format(label_,map_df.loc[i,'price'],
map_df.loc[i,'review_scores_value']),
popup='<a href={} target="_blank"><u>Airbnb_Link</a>'.format(map_df['listing_url'].iloc[i]),
radius=50,
color=color,
fill=True,
fill_color=color,
fill_opacity=0.5).add_to(fg)
elif color not in ['red','blue']:
folium.vector_layers.Circle(location=[map_df.loc[i,'latitude'], map_df.loc[i,'longitude']],
tooltip="{}<br>Price: {} <br> Rating: {}<br> Click to Show Link!".format(map_df.loc[i,'room_type'],
map_df.loc[i,'price'],
map_df.loc[i,'review_scores_value']),
popup='<a href={} target="_blank"><u>Airbnb_Link</a>'.format(map_df['listing_url'].iloc[i]),
radius=50,
color=color,
fill=True,
fill_color=color,
fill_opacity=0.5).add_to(fg2)
except:
passmy_map2.save(path + 'Dots_DC.html')
webbrowser.open('/Users/Nolan/DS736/Python_datafiles/Dots_DC.html')