Backround

options(timeout=600)

streets <- getbb('Amsterdam Netherlands') %>%
  opq() %>%
  add_osm_feature(key='highway',
                  value=c('motorway', 'primary',
                          'secondary', 'tertiary')) %>%
  osmdata_sf()

small_streets <- getbb('Amsterdam Netherlands') %>%
  opq() %>%
  add_osm_feature(key='highway',
                  value=c('residential', 'living_street',
                          'service', 'footway')) %>%
  osmdata_sf()

rivers <- getbb('Amsterdam Netherlands') %>%
  opq() %>%
  add_osm_feature(key='natural',
                  value=c('water')) %>%
  osmdata_sf()
  ggplot() +
  geom_sf(data=streets$osm_lines,
          inherit.aes = FALSE,
          color = '#FF5A5F',
          size = .5,
          alpha = .6) +
  geom_sf(data=small_streets$osm_lines,
          inherit.aes = FALSE,
          color = '#FF5A5F',
          size = .2,
          alpha = .6) +
  geom_sf(data=rivers$osm_polygons,
          inherit.aes = FALSE,
          fill='white',
          size = .2) +
  coord_sf(ylim=c(52.35, 52.40),
           xlim=c(4.83, 4.97),
           expand=FALSE) +
  # ADD THIS CODE
  theme(panel.background = element_rect(color='black', 
                                    fill='black'),
    panel.grid = element_blank(),
    axis.ticks = element_blank(),
    axis.text  = element_blank(),
    plot.title = element_text(size=18, 
                              face='bold',
                              hjust=.5,
                              color='black'),
  )

Fig.1. Map of Amsterdam

Amsterdam Overtourism

Overtourism is a polarizing issue plaguing the city of Amsterdam, sparking widespread debate and concern among both residents and policymakers.¹ Overtourism is defined as a situation in which too many tourists travel to a popular destination, causing the place to suffer negative environmental, economic, and sociocultural impacts.² In 2019, the number of tourists to Amsterdam inhabitants was a staggering 2,341% (fig 2), indicating that for every local there were roughly 23 tourists.³ This disproportionate influx has led to tourism-based gentrification in Amsterdam, a phenomenon where residents are displaced as services and housing are increasingly tailored to meet tourist demands rather than those of the local population. The resulting consequences have been significant, manifesting in a dwindling housing stock, spiraling rental prices (Fig 3), and an increase in noise ordinances and public disturbances.⁴

Overtourism Stats

Amsterdam Overtourism

Fig.2. Amsterdam’s number of inhabitants vs inbound overnight stays

Average Rent Increase

Fig.3. Average rent price of residential property in Major Cities

Amsterdam Airbnb Stats

The dramatic surge in tourist numbers has unsurprisingly correlated with a rise in short-term rental accommodations, including Airbnb listings, which have grown exponentially over the years. This growth is underscored by the rise in Airbnb guests throughout the Netherlands, from 75,000 in 2012 to 1.6 million in 2017.⁵ Specifically, within Amsterdam itself, Airbnb property listings have skyrocketed from 2,400 listings in 2012 to an overwhelming 19,619 by 2019⁶, reflecting the platform’s substantial impact on the city’s housing market.

Amsterdam Airbnb Listings

Fig.4. Amsterdam’s rising Airbnb listings

Airbnb Listing vs Searches

Fig.5. Amsterdam Airbnb Lissting vs Searches

Short-Term Rental Regulations

To mitigate the mounting pressures from the influx of Airbnb listings and the consequent housing issues, Amsterdam took proactive steps by becoming the world’s first city to establish regulations on Airbnb rentals, forming a partnership with the platform itself. This pioneering move aimed to regulate the short-term rental market that had, until recently operated with little oversight, aims to address the community’s concerns over noise disturbances, the shrinking supply of residential housing, and the preservation of neighborhood character. These regulations include.⁷ ⁸

Listings cannot be rented out for more than 30 days a year: If listings reach or go over their yearly allotment of rental days the city of Amsterdam can request that Airbnb block out the listings calendar for the remainder of the year.
Hosts cannot rent out multiple properties: Hosts are not allowed to rent out additional properties such as summer houses, tents, or houseboats.
All listings must have a permit and registration number: In order homes to be listed on Airbnb hosts need to obtain a vacation rental permit from the city of Amsterdam. These permits are temporary and must be renewed annually.
Rental must be the hosts main residence: The host must be registered as living at the listing address in the Personal Records Database (BRP) of the municipality of Amsterdam.
Notification of booking: According to Amsterdam Housing Regulations hosts are required to notify the city every time they rent out their home prior to the first day of check in.
Housing corporation and social housing rentals cannot be listed: Rental properties from housing corporations, social housing, or rent controlled apartments may not be listed as Airbnb rentals.

Fig.6. Fairbnb logo

Fairbnb

While Amsterdam has taken a step forward by establishing these regulations, the city’s next challenge lies in developing and implementing effective monitoring and enforcement mechanisms to ensure that these regulations address the challenges posed by short-term rentals and their contribution to overtourism. Given the complexities associated with the regulation of Airbnb and similar platforms, it becomes imperative for Amsterdam to explore innovative solutions that can effectively bridge the gap between policy and enforcement. Fairbnb emerges as groundbreaking app crafted to align with Amsterdam’s regulatory framework. This application is not merely a response to the existing challenges, but a forward-thinking approach designed to complement and enhance the city’s efforts in regulating short-term rentals. Fairbnb’s features enable precise tracking of rental days, ensures hosts adhere to the one-rental property rule, and introduces a new fair-market pricing feature accompanied by an additional policy recommendation.

Fairbnb Application Interface

Fairbnb Startup Screen

Fig.7.Startup Screen

Fairbnb Mapping Feature

Fig.8.Mapping Feature

Tracking Rental Days

Fairbnb is designed to meticulously monitor listings to ensure they do not exceed the 30-day annual rental allotment. This is achieved through a comprehensive tracking system that utilizes data provided by the city of Amsterdam, directly from Airbnb, and additional information obtained through web scraping. The app features a state-of-the-art alert system that informs the city of Amsterdam about listings that are approaching, have reached, or have exceeded the 30-day limit. Once a listing reaches this threshold, Fairbnb will promptly notify Airbnb, enabling them to block the listing’s calendar for the remainder of the year.

Fairbnb Listing Tracking Feature

Fairbnb Listings at Risk

Fig.9.Fairbnb rental allottment tracker

Fairbnb Listings at or over Max

Fig.10.Fairbnb rental allottment tracker

Tracking Multiple Rental Properties

Fairbnb is also designed to identify hosts who list multiple properties by utilizing an advanced detection system that analyzes data provided by the city of Amsterdam, Airbnb, and additional data collected with web-scraping techniques. By cross-referencing this information, the app can accurately pinpoint hosts who might be in violation of the one-property rule. Upon identifying such cases, Fairbnb sends notifications to the city authorities, providing them with detailed insights about potential violations. In addition, the app communicates with Airbnb to initiate appropriate actions against these listings.

$Fig.11. Fairbnb rental property tracker feature$

Fig.11. Fairbnb rental property tracker feature

Introducing Fair-Market Pricing

Beyond assisting in the monitoring and enforcement of existing regulations, Fairbnb proposes an innovative addition: a fair pricing regulation, supported by a corresponding feature within the app. This initiative focuses on monitoring and curbing price gouging in Airbnb rentals, serving as a deterrent for hosts contemplating short-term rentals, thereby aiding in alleviating the issues of shrinking housing stock and overtourism. The proposed system will leverage historical Airbnb listing and pricing data, complemented by data obtained through web scraping, to establish a fair-market daily rental price. The mechanics of this advanced fair-market pricing system will be elaborated further in the methodologies section of the proposal.

Fig.12. Fairbnb fair-market pricing features

Data Wrangling

In the data wrangling section of our analysis, we integrate a variety of datasets to gain a comprehensive understanding of Amsterdam’s short-term rental market, particularly focusing on Airbnb’s impact. Finding the appropriate data for our analysis presented a significant challenge, primarily due to the language barrier, as most of the relevant datasets were in Dutch. Our primary data sources include the Statistics Netherlands (CBS) Open Data, which offers a wealth of socio-economic and demographic statistics. Kaggle’s Airbnb Amsterdam dataset, a collection of scrapped Airbnb listing information, provides detailed insights into the dynamics of the short-term rental market. Additionally, we utilize the Amsterdam City Open Geo Data, a resource rich in urban geographical information, to spatially map and analyze the distribution and impact of Airbnb rentals across the city. Finally, the CBS StatLine Portal supplements our analysis with more nuanced, customizable statistical data. The data wrangling process involved meticulous cleaning and sometimes translating data from Dutch to English. This included converting data formats, such as downloading data into CSV format and transforming data types (e.g., from character to numeric), along with extensive string manipulations to enable joins between different datasets. Feature engineered variables like daily price, amenities count, or the calculation of multiple ring buffers to determine the distance of a listing to the city center were subsequently derived to provide deeper analytical insights.

#JArred
listings <- read.csv("C:\\Users\\fatbo\\OneDrive\\Desktop\\Airbnb\\Airbnb\\listings.csv")

listings_details <- read.csv("C:\\Users\\fatbo\\OneDrive\\Desktop\\Airbnb\\Airbnb\\listings_details.csv")

nhoods <- st_read("C:\\Users\\fatbo\\OneDrive\\Desktop\\Airbnb\\Airbnb\\neighbourhoods.geojson")

joined <- left_join(listings, listings_details, by = "id", suffix=c("", ".y")) %>% select(-ends_with(".y"))

attraction_distance <- read.csv("C:\\Users\\fatbo\\OneDrive\\Documents\\GitHub\\Final_PPA\\Data\\Airbnb\\distance_attrac.csv")

income <- read.csv("C:\\Users\\fatbo\\OneDrive\\Documents\\GitHub\\Final_PPA\\Data\\Amsterdam_income.csv")

census <- read.csv("C:\\Users\\fatbo\\OneDrive\\Documents\\GitHub\\Final_PPA\\Data\\final_census.csv")




# Joining data sets, converting to crs
attraction_distance <- attraction_distance %>% transform(zipcode = as.character(zipcode))

listings <- left_join(listings, listings_details, by = "id", suffix=c("", ".y")) %>% select(-ends_with(".y"))

listings <- left_join(listings, attraction_distance, by = "zipcode", suffix=c("", ".y")) %>% select(-ends_with(".y")) %>% st_as_sf(coords = c('longitude', 'latitude'), crs = 4326)  %>%
  st_transform(st_crs(nhoods))

listings <- st_join(listings, nhoods, join = st_within, suffix=c("", ".y")) %>% select(-ends_with(".y"))

total_listings <- nrow(listings)
sqft_na <- round(percent((sum(is.na(listings$square_feet)) / total_listings)),2)
sqft_non_na <- round(percent((1- sum(is.na(listings$square_feet)) / total_listings)),2)

sqft_table <- tibble(sqft_na, sqft_non_na)


# Converting census columns into numeric
for (i in sequence(33,3,1)) {
  census[[i]] <- as.numeric(census[[i]])
}

# Grouping and summarizing census data per zip code
census <- census %>% group_by(PostalCode) %>% summarize(married = sum(Married, na.rm = TRUE), 
                                                        migration = sum(Persons_with_a_migration_background, na.rm =     TRUE),
                                                        vio_sex_crimes = sum(Violentandsexualcrimes, na.rm = TRUE),
                                                        cafes = mean(cafe_Within_1_km, na.rm = TRUE),
                                                        restaurants = mean(restaurants_Within_1_km, na.rm = TRUE),
                                                        hotels = mean(hotels_Within_5_km, na.rm = TRUE),
                                                        road = mean(Distance_to_main_road_entrance, na.rm = TRUE),
                                                        train = mean(Distance_to_train_station, na.rm = TRUE)) %>%
                                             filter(!PostalCode %in% (".")) 


# Formatting zip code correctly and changing column name
census$PostalCode <- str_extract(census$PostalCode, "^.{4}")

listings$zipcode <- str_extract(listings$zipcode, "^.{4}")  

colnames(census)[1] <- "zipcode"

# Joining listings with zip codes
listings <- left_join(listings, census, by = "zipcode")

# Adding the number of amenities per listing
amenities_count <- as.integer(str_count(listings$amenities, ","))
amenities_count <- ifelse(amenities_count > 0, amenities_count + 1, amenities_count)

listings <- cbind(listings, amenities_count) 

# Converting columns weekly price, monthly price, cleaning fee, and security deposit to numeric values
for (i in sequence(6, 67, 1)) {
  listings[[i]] <- gsub("^.{0,1}", "", listings[[i]])
  listings[[i]] <- as.numeric(ifelse(nchar(listings[[i]]) == 0, "0", listings[[i]]) %>% str_replace(",", ""))
}

# Remove rows with no rent data
listings <- listings %>% filter(weekly_price > 0 | monthly_price > 0)

# Calculating a daily price from either weekly or monthly price
listings$daily_price <- ifelse(listings$weekly_price > 0, listings$weekly_price / 7, listings$monthly_price / 30)

# Filtering for outliers
listings <- listings %>% filter(daily_price > 0 & daily_price < 1500)

# Selecting a list of all relevant variables
var <- c("zipcode", "daily_price", "room_type", "host_is_superhost", "host_identity_verified", "accommodates", "beds", "bed_type", "cancellation_policy", "number_of_reviews", "property_type", "bathrooms", "bedrooms", "amenities_count", "review_scores_rating", "has_availability", "guests_included", "security_deposit", "cleaning_fee", "extra_people","neighbourhood", "availability_30", "weekly_price", "monthly_price", "near_ATTRAC", "near_museum", "near_Performing_arts", "married", "migration", "vio_sex_crimes", "cafes", "restaurants", "hotels", "road", "train")

listings <- listings %>% select(all_of(var)) 


# Removing NA values for important regression variables. Annotate how many we lost (Jarred)
listings <- listings %>% drop_na(cleaning_fee, security_deposit)

# Creating table counting NA values in square feet column
total_listings <- nrow(listings)
sqft_na <- round(percent((sum(is.na(listings$square_feet)) / total_listings)),2)
sqft_non_na <- round(percent((1- sum(is.na(listings$square_feet)) / total_listings)),2)

sqft_table <- tibble(sqft_na, sqft_non_na)

# Creating table with average security deposit for listings above 100$ vs below 100$
above_100 <- listings %>% filter(daily_price > 100)
below_100 <- listings %>% filter(daily_price <= 100)

deposit_above_100 <- round(mean(above_100$security_deposit),2)
deposit_blw_100 <- round(mean(below_100$security_deposit),2)

deposit_table <- tibble(deposit_above_100, deposit_blw_100)

Exploratory Data Analysis

variables_to_summarize <- c("daily_price", "room_type", 
                            "near_ATTRAC", 
                            "number_of_reviews", "property_type", 
                            "bathrooms", "bedrooms",
                            "review_scores_rating", 
                            "cancellation_policy", "host_identity_verified",
                            "accommodates", "beds", "bed_type",
                            "host_is_superhost", "cleaning_fee",
                            "security_deposit", "extra_people", 
                            "zipcode", "availability_30", "neighbourhood")

summary_stats <- listings

summary_stats_df <- as.data.frame(summary_stats)
summary_table <- describe(summary_stats_df[, variables_to_summarize])



rounded_summary <- t(apply(summary_table, 1, function(x) {
  if (all(is.numeric(x))) {
    return(round(x, 2))
  } else {
    return(x)
  }
}))

# Create the table using kableExtra
rounded_summary %>%
  kbl(caption = "Table 1: Summary stats") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Table 1: Summary stats
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
daily_price	1	3029	115.33	65.21	100.0	105.22	42.36	14.29	857.14	842.86	2.99	16.74	1.18
room_type*	2	3029	1.17	0.38	1.0	1.08	0.00	1.00	3.00	2.00	1.91	2.01	0.01
near_ATTRAC	3	1113	2.43	1.17	2.3	2.29	0.89	0.90	8.20	7.30	2.15	7.00	0.04
number_of_reviews	4	3029	41.46	65.46	20.0	26.26	22.24	0.00	695.00	695.00	3.59	16.76	1.19
property_type*	5	3029	3.25	4.50	1.0	2.23	0.00	1.00	18.00	17.00	1.76	1.65	0.08
bathrooms	6	3024	1.12	0.34	1.0	1.04	0.00	0.00	4.00	4.00	2.96	11.77	0.01
bedrooms	7	3025	1.52	0.94	1.0	1.37	0.00	0.00	10.00	10.00	1.96	7.27	0.02
review_scores_rating	8	2879	95.17	4.76	96.0	95.86	2.97	40.00	100.00	60.00	-2.61	14.03	0.09
cancellation_policy*	9	3029	2.33	0.70	2.0	2.41	1.48	1.00	3.00	2.00	-0.55	-0.84	0.01
host_identity_verified*	10	3029	1.63	0.48	2.0	1.67	0.00	1.00	2.00	1.00	-0.56	-1.69	0.01
accommodates	11	3029	2.95	1.40	2.0	2.78	0.00	1.00	16.00	15.00	3.19	22.45	0.03
beds	12	3029	1.98	1.56	1.0	1.68	0.00	1.00	20.00	19.00	3.56	23.33	0.03
bed_type*	13	3029	3.98	0.18	4.0	4.00	0.00	1.00	4.00	3.00	-9.46	98.35	0.00
host_is_superhost*	14	3029	1.19	0.40	1.0	1.12	0.00	1.00	2.00	1.00	1.55	0.39	0.01
cleaning_fee	15	3029	33.75	25.35	30.0	31.60	22.24	0.00	250.00	250.00	1.21	4.65	0.46
security_deposit	16	3029	171.86	270.98	100.0	123.62	148.26	0.00	4489.00	4489.00	5.39	55.55	4.92
extra_people	17	3029	14.01	22.26	0.0	9.79	0.00	0.00	258.00	258.00	3.18	20.86	0.40
zipcode*	18	2971	31.26	19.29	29.0	30.66	26.69	1.00	74.00	73.00	0.16	-1.09	0.35
availability_30	19	3029	4.24	7.45	0.0	2.41	0.00	0.00	30.00	30.00	1.94	2.89	0.14
neighbourhood*	20	3029	11.44	6.22	9.0	11.12	5.93	1.00	22.00	21.00	0.49	-1.24	0.11

# Show a of the unlogged daily price (Jarred) 
gghistogram(
  data = listings, x = "daily_price", 
  add = "mean", rug = TRUE,
  fill = airbnb_color) +
  labs(title = "Daily AirBnB Price Distribution Amsterdam") +
   xlab("Daily Price") +
   ylab("Count")

Fig.13

We constructed our dependent variable, the daily price of renting an AirBnB in Amsterdam, by dividing the week price by seven or the monthly price by thirty. Since longer bookings usually include greater discounts, and since a week is shorter than a month, we preferred calculating the daily price from the weekly price, and only if the weekly price was not available, we would normalize the monthly price to a daily price.

When visualizing the daily price, we see a heavy right-skew. Since our model will employ OLS regression, normalized variables are preferred since they can increase the performance and significance of the model.

We hence continue to log transform our dependent variable.

# Log price by Property Type (Jarred) 
listings$daily_price <- log(listings$daily_price)

gghistogram(
  data = listings, x = "daily_price", 
  add = "mean", rug = TRUE,
  fill = airbnb_color) +
  labs(title = "Daily AirBnB Price Distribution Amsterdam") +
   xlab("Daily Price") +
   ylab("Count")

Fig.14

The log transformed daily price shows an approximate normal distribution and will therefore be used as dependent variable for the regression model.

Independent Variables (Predictors)

We want to build a hedonic housing price model that includes 1) internal features of the house, the 2) spatial process, 3) demographics, 4) centrality, and 5) infrastructural variables.

To pick meaningful predictors, we set up a correlation matrix and observe values that have a high correlation with our dependent variable. Note that it doesn’t matter if the correlation is negative or positive- high values in both contain potentially valuable information for the model.

In the next subsections, we will go through each of these points, explain which features we engineered, and whether the correlation matrix gives us a reason to include these features in our model.

# Calculating averages per neighborhood
avg_nhood <- listings %>%
  group_by(neighbourhood) %>%
  summarize(avg_road = mean(road, na.rm = TRUE), 
            avg_cafes = mean(cafes, na.rm = TRUE), 
            avg_restaurants = mean(restaurants, na.rm = TRUE), 
            avg_hotels = mean(hotels, na.rm = TRUE), 
            avg_train=mean(train, na.rm = TRUE), 
            avg_price = exp(mean(daily_price, na.rm = TRUE)),
            avg_attrac = mean(near_ATTRAC, na.rm = TRUE),
            avg_amenities = mean(amenities_count, na.rm = TRUE),
            avg_sex_crimes = mean(vio_sex_crimes, na.rm = TRUE),
            avg_migration = mean(migration, na.rm = TRUE),
            avg_married = mean(married, na.rm = TRUE))

neighbourhood_boundaries <- nhoods

# Adding neighborhood boundary info
neighbourhood_boundaries$neighbourhood <- as.character(neighbourhood_boundaries$neighbourhood)
avg_nhood$neighbourhood <- as.character(avg_nhood$neighbourhood)

merged_data_sf <- st_join(neighbourhood_boundaries, avg_nhood)

# Looping to save a map for each variable inside. We save the ggplots in the object "maps"
maps <- lapply(c("avg_price", "avg_road", "avg_cafes", "avg_restaurants", "avg_hotels", "avg_train", "avg_attrac", "avg_amenities", "avg_sex_crimes", "avg_migration", "avg_married"), function(i) {  
  ggplot()+
  geom_sf(data = merged_data_sf, aes(fill = (avg_nhood[[i]])))+
  theme_void()})

maps[[1]] +  labs(title = "Average Price") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Price Gradient in USD") +
             labs(title = "Avg. Daily AirBnB Price in Amsterdam")

Fig.15

corr_vars <- 
  select_if(st_drop_geometry(listings), is.numeric) %>% na.omit()

corr_vars <- select_if(st_drop_geometry(listings), is.numeric) %>%
  na.omit()

corr_vars <- corr_vars %>% select(-guests_included)

corr_vars %>% 
  correlate() %>% 
  autoplot() +
  geom_text(aes(label = round(r,digits=2)),size = 2)

Fig.16

One interesting observation from the correlation matrix upfront is that daily price is much more correlated with weekly price (0.87) than with monthly price (0.4) implying that it is likely that for most daily price calculations, we actually had a weekly price value and didn’t have to use the monthly price to infer the daily price. This is good for the model for reasons related to long-term rental discounts as mentioned above.

Setting the threshold at 0.9, we also see that only restaurants and cafes show excessive colinearity.

Hedonic Variables

Numeric Hedonic Predictors

As for hedonic variables, we used the predictors in the given data set and engineered a new feature called amenities count, to which we will get later.

As expected, the correlation matrix shows that number of accommodates (0.55), bedrooms (0.54), beds (0.53), and bathrooms (0.33) are the leading four values. What is interesting is that all of these predictors are essentially proxies for the size of a listing. Since we know that usually, square feet is among the top housing price predictors, but we unfortunately don’t have that data for most of our observations, it only makes sense that these “best proxies” have the highest absolute correlation with the daily price.

Another interesting observation is that the number of bathrooms is less positively correlated with daily price as the other three predictors- a result that is counter-intuitive since most of our work was done in the North American context. Upon some research, there is an easy explanation for that: European houses, on average, simply have fewer bathrooms, even when normalized on a per square feet basis.⁹

The remaining hedonic variables that show some correlation are the cleaning fee (0.39) and the security deposit (0.2). With regard to cleaning fees, it makes sense that more expensive listings will charge higher fees as they have likely more expensive interieur and a greater area to clean.

What is surprising, however, is that the security deposit is only half as strongly correlated with listing price as the cleaning fee. Intuitively, the strength of correlation between cleaning fee and security deposit should approximately match.

There are several explanations as to why there is this difference. Firstly, if the host sets no security deposit amount himself, AirBnB will calculate itself an automated rate that takes 60% of a nights fee times the number of nights rented. This, however, is capped at 1000$. This implies that expensive listings with no host set price cap are only requiring a rather low security deposit. Another reason is likely that high security deposits make a listing less likely to be booked by increasing the barriers to book the apartment. A host tries to make his listing attractive, and listings that have good features and are well situated with rather low security deposits are more likely to be booked than similar apartments with rather high security deposits.

deposit_table %>%
  kbl(caption = "<strong><center>Table 2:Square Feet NA vs non-NA </strong></center>", escape = FALSE, format = "html") %>%
  kable_classic(full_width = F, html_font = "Cambria")

**Table 2:Square Feet NA vs non-NA**
deposit_above_100	deposit_blw_100
218.65	132.49

Surprisingly, the engineered feature amenities count which counts the number of amenities in a listing is only weakly correlated with the price. This is because users are rather inconsistent in naming all features, and many low priced homes have many of the features like a couch, TV, etc. The only way to discriminate an expensive from an inexpensive listing could be “special features” like pools, gaming rooms, and so forth - but in a count, a pool and a TV would both simply add one the count.

Non-Numeric Hedonic Predictors

As for non-numeric hedonic predictors, we look at the property type. This is an obvious predictor to pick since shared rooms, for instance, are likely to be much less expensive that whole houses.

colors <- rainbow(length(unique(listings$neighbourhood)))

faded_colors2 <- adjustcolor(colors, alpha.f = 0.6) 

ggplot(data = listings, aes(x = room_type, y = exp(daily_price), fill = room_type)) + 
  geom_boxplot() +
  scale_fill_manual(values = faded_colors2, name = "Listing Type") + 
  theme_minimal() +
  ylim(0,200) +
  labs(y = "Daily Price in $", x = "", title = "Daily Price by Listing Type in $") +
  theme(
    axis.text.x = element_blank(), 
    legend.position = "bottom", 
    legend.title.align = 0.5, 
    plot.title = element_text(hjust = 0.5),
    panel.grid.major = element_blank()
  ) +
  guides(fill = guide_legend(nrow =1, byrow = TRUE, title.position = "top"))

Fig. 18

Spatial process

A regression run purely on hedonic factors results in an R² of 0.4. This value is relatively low, and one could argue that including the spatial process could increase it significantly.

# Log price by neighborhood (Jarred) 
listings$neighbourhood <- factor(listings$neighbourhood)

faded_colors <- adjustcolor(colors, alpha.f = 0.4) 

ggplot(data = listings, aes(x = neighbourhood, y = exp(daily_price), fill = neighbourhood)) + 
  geom_boxplot() +
  scale_fill_manual(values = faded_colors, name = "Neighborhoods") + 
  theme_minimal() +
  ylim(0,200) +
  labs(y = "Daily Price in $", x = "", title = "Daily Price by Neighbourhood in $") +
  theme(
    axis.text.x = element_blank(), 
    legend.position = "bottom", 
    legend.title.align = 0.5, 
    plot.title = element_text(hjust = 0.5),
    panel.grid.major = element_blank()
  ) +
  guides(fill = guide_legend(nrow = 5, byrow = TRUE, title.position = "top"))

Fig. 19

We clearly see that neighbourhoud can explain some of the daily price. But we also see that the effect is not as strong as we would see it in some US cities.

Additionally, we see that central districts like Centrum West are usually more highly priced than neighbourhods farther away. We will therefore engineer the distance-to-center vs daily price relation in a later section.

Adding the neighourhoud effects and even the k-nearest neighbors only increases R² to 0.5. What this implies is that neighbourhood and k-nearest can help explain a significant portion of the variance in price, but by far not most of it. In simple terms, our model is missing vital information, and this only makes sense: K-nearest are a better predictor for very densely populated housing price predictions since we can be assured that we will have very immediate neighbors with likely the same features.

However, our data set is not as dense, meaning that our k-nearest observations are quite far away, and that as a consequence, we could even switch between neighbourhouds. Additionally, we are missing the square feet size of each apartment, a core predictor for each pricing prediction.

One finding in support of this argument is that in a later section, our regression accuracy will only increase by 0.02 when changing our k from 5 to 10 in the k nearest neighbor function. This implies that surrounding listings do nearly nothing to explain the value of the listing itself.

Add average distance to k nearest neighbors (use spatstats package) nndist(listings, 5) spatstats OR say that changing k nearest from 5 to 20 doesn’t make a big difference in the model -> low spatial process -> more hedonic variables for prediction

Given the above observations, we therefore tried to come up with demographic and infrastructural feature engineering to improve the model.

Demographics

The idea behind demographic features is that they could contain information on the AirBnB price as customers surely are concerned about their safety and social environment of their AirBnB.

We therefore assume two things: First, that customers do good research before committing to an AirBnB and are willing to pay a premium for more safe neighbourhoods, and secondly, that these demographic features are not evenly distributed in the suburbs, but show peaks and lows depending on whether a neighborhood is good and therefore highly priced, or not.

With this in mind, we imported data on the percentage of migrants, number of violent sex crimes, and the proportion of married population for each listing. It is rather surprising to see that these demographics don’t really have an effect on the correlation.

As explained above, one reason for this could be that all some of these factors are rather distributed throughout expensive and non-expensive listings in the city and therefore would not help our model discriminate high priced from low priced listings.

Support for this comes from the map comparing neighbourhoods where crimes occur versus neighbourhoods with high proprtion of married households.

Crime & Distance to Major Roads

Crime

# 9 crimes, 10 migrants, 11 married
maps[[9]] +  labs(title = "Average Crimes") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Average Crime") +
             labs(title = "Avg. Crimes by Neighborhood")

Fig. 20

We first observe that there is a hotspot for crimes in the same area in which there are many cafes, as we had previously predicted.

Distance to Major Roads

# 9 crimes, 10 migrants, 11 married
maps[[3]] +  labs(title = "Average Distance to Major Road") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Average Distance") +
             labs(title = "Avg. Distance to Major Roads Per Neighborhood ")

Fig. 21

We then see that the areas with high migrant population are not those where most crimes occur, and that areas with high migrant population also have high marriage rates.

Migrants & Marital Status

Migrant Population

# 9 crimes, 10 migrants, 11 married
maps[[10]] +  labs(title = "Pecentage of Migrants") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Percentage of Migrants") +
             labs(title = "Avg. Percentage of Migrants per Neighborhood")

Fig. 22

Married Population

# 9 crimes, 10 migrants, 11 married
maps[[11]] +  labs(title = "Average Married Population") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Percentage Married Population") +
             labs(title = "Avg. Married Residents Per Neighborhood")

Fig. 23

However, the whole central-west side of the city has quite high marriage rates, implying that if our hypothesis that most expensive apartments are in the center is true, marriage would not be as strong a predictor as we’d wish.

Centrality

European cities are generally more center focused. This is to say that there is always a central part of the town in which most commerce, infrastructure, but also tourism happens. The areas around the center are rather constructed such that they cater toward the central district.

With all these properties, the assumption that tourists will be willing to pay a premium to live close to central makes sense. And even though a great public transportation network may attenuate that willingness, there still should be a visible trend.

# Distance to city center (why are there NA's?)
MRB <- multipleRingBuffer(st_union(listings %>% filter(neighbourhood == "Centrum-West")), 11000, 200)

listings <- st_join(listings, MRB, join = st_intersects) %>% 
            st_sf()

listings <- listings %>% mutate(distance = ifelse(listings$neighbourhood == "Centrum-West", 0, distance))

# Map of MRB and data points
#ggplot() + geom_sf(data = MRB) + geom_sf(data = listings)

ggplot() + 
  geom_line(
    data = listings %>% filter(distance < 6000), 
    aes(x = distance, y = exp(daily_price)), 
    stat = "summary", 
    fun = mean,
    color = airbnb_color,
    size = 1.5
  ) +
  labs(
    title = "Median Rent as Function of Distance in Miles"
  ) +
  xlab("Distance in Miles") +
  ylab("Median Rent in US-Dollars") +
  theme_bw(base_rect_size = .5) +
  theme(
    panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank()
  )

Fig. 24

The graph clearly shows a distance-to-center gradient in price, but we make two observations. Firstly, even though the general trend is downward sloping, there are many fluctuations inherent in the graph. These could either be wealthy neighbourhouds or a sign that Amsterdam is a rather “equal” city.

Secondly, most observations find themselves between 80-120$, a price difference that is not incredibly stark given that one apartment is in the heart of the center and another one is 6 miles away. This speaks for our comment that good public transport and an even demographic might attenuate the distance-to-center price gradient.

To answer the question from previous section, we see why marriage and migration would not be as strong a predictor as we’d wish. This is because both are high in the whole central west of the city, but prices rather steadily, even if not strongly, decline. Additionally, there are also highly valued listings in the east of the city where marriage and migration rates are quite low, further attenuating the effect of migration and marriage on daily price.

Next to safety, connectivity and access to public transport is of great importance for tourist.

Infrastructure

To get from one attraction to the next, we usually can’t get around public transportation. AirBnB property owners are aware of this fact and we assume that customers will have to pay a premium for good connectivity, or take a tradeoff on decrease daily price of property and decrease in connectivity.

We therefore engineered a variety of transportation data, including distance of the listing zipcode to the nearest major road and train station. We also included data on the distance of the listing zipcode to the nearest tourist attraction, museums, performing arts centers, and the number of cafes and restaurants per zipcode.

The distance to tourist attractions, museum, and performing arts centers didn’t show any significant correlation, implying that customers likely are not willing to pay up a premium to be close to these facilities. One reason could be that the transportation network in the city is so great that there might be no real need to be that close to these facilities, and that there are many transportation connections all around the city.

maps[[2]] +  labs(title = "Average Price") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Price Gradient in USD") +
             labs(title = "Avg. Daily AirBnB Price in Amsterdam")

Fig. 25

Indeed, Amsterdam has the second best urban public transport system in Europe according to a Bloomberg article.¹⁰

This is corroborated by the average distance of a neighbourhood to what is considered a main road.

maps[[2]] +  labs() + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Distance in km") +
             labs(title = "Avg. Distance to Main Road")

Fig.26

What about restaurants, cafes, and hotels?

# 3 cafes 4 restaurants 5 hotels
maps[[3]] +  labs() + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Number of Cafes") +
             labs(title = "Number of Cafes per Neighborhood")

Fig. 27

# 3 cafes 4 restaurants 5 hotels
maps[[4]] +  labs() + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Price Gradient in USD") +
             labs(title = "Avg. Daily AirBnB Price in Amsterdam")

Fig. 28

Restaurants and cafes are, as expected, concentrated around the central district. However, there are also districts in the mid-west and mid-east of the city that have high housing prices but also many restaurants and cafes.

# 1 price
maps[[1]] +  labs(title = "Average Price") + 
             scale_fill_gradient(low = "white",high = airbnb_color, name = "Price Gradient in USD") +
             labs(title = "Avg. Daily AirBnB Price in Amsterdam")

Fig. 29

This attenuates the strength of the predictors numbers of restaurants and cafes. Additionally, there are tracts with very high daily rates in the northwest of the city that only have very few restaurants.

In conclusion, engineering non-hedonic features has been quite demanding as so many factors seem to be well distributed in the city with low opportunities to find factors that clearly point to a price discrimination. This is a good sign as it implies that the city is rather equitable and internal factors of a house rather than demographics like migration influence housing prices.

Preparation for Regression Model

# Checking if we have too many categories in a variable

roomtype <- unique(listings$room_type)
length(roomtype)

property_type <- unique(listings$property_type)
length(property_type)

property_type <- unique(listings$property_type)
length(property_type)

sum(is.na(listings$square_feet))

listings <- listings %>% mutate(extra_people_is_cost = ifelse(listings$extra_people %in% c("$0.00"), "No", "Yes"))



variables_to_summarize <- c("room_type", "beds", "accommodates", "bed_type",
                            "bathrooms", "bedrooms", "review_scores_rating", "property_type")


unique_values_df <- data.frame(Variable = character(), Unique_Values = integer(), stringsAsFactors = FALSE)


for (var in variables_to_summarize) {
  unique_values_df <- rbind(unique_values_df, data.frame(Variable = var, Unique_Values = length(unique(listings[[var]]))))
}


unique_values_df %>%
  kbl(caption = "<strong><center>Table 3: Unique Values per Variables</strong></center>", escape = FALSE, format = "html") %>%
  kable_classic(full_width = F, html_font = "Cambria")

In the last step of this section, we check whether we have some features that only have few unique factors. If so, this could become an issue when partitioning the data set into training and test data sets. We see that bed type, property type, and room type are candidates for potential trouble makers. We therefore checked for these rare values and filter them out in the first step in the methodology section.

In conclusion, the feature engineering section gives cues that good accessibility and a rather even price distribution throughout neighbourhoods, along with rather evenly spread demographics like sex crimes or number of cafes and restaurants speak for a rather equitable urban space in Amsterdam.

Methods & Results

Training Regression

# Partitioning data set into training and test
set.seed(1000)

inTrain <- createDataPartition(
              y = paste(listings$daily_price,
                        listings$zipcode,
                        listings$neighbourhood),
              p = .6, list = FALSE)

# Filtering out factors in features that occur seldomly to avoid "new levels" error
listings_train <- listings[inTrain,] %>% filter(!room_type %in% c("Shared room"), !property_type %in% c("Boutique hotel", "Bungalow", "Cabin", "Chalet", "Nature lodge", "Villa", "Serviced apartment"), !bed_type %in% c("Airbed"))

listings_test <- listings[-inTrain,] %>% filter(!room_type %in% c("Shared room"), !property_type %in% c("Boutique hotel", "Bungalow", "Cabin", "Chalet", "Nature lodge", "Villa", "Serviced apartment"), !bed_type %in% c("Airbed"))

# Adding nearest neighbours
coords.test.training <-  st_coordinates(listings_train) 

neighborList.training <- knn2nb(knearneigh(coords.test.training, 5))

spatialWeights.training <- nb2listw(neighborList.training, style="W")


listings_train <- listings_train %>% mutate(k_nearest_price = lag.listw(spatialWeights.training, daily_price) / accommodates)

# Running regression
  reg_train <- 
    lm(daily_price ~ ., data = as.data.frame(listings_train) %>% 
                               dplyr::select(daily_price,
                                             room_type,
                                             near_ATTRAC,
                                             k_nearest_price,
                                             number_of_reviews,
                                             property_type,
                                             bathrooms,
                                             bedrooms,
                                             review_scores_rating,
                                             cancellation_policy,
                                             host_identity_verified,
                                             accommodates,
                                             beds,
                                             bed_type,
                                             host_is_superhost,
                                             cleaning_fee,
                                             security_deposit,
                                             extra_people,
                                             availability_30,
                                             zipcode,
                                             cafes,
                                             neighbourhood,
                                             distance))
  
summary(reg_train)$r.squared

sqrt(summary(reg_train)$r.squared) %>%
  kbl(caption = "<strong><center>Table 4: R</strong></center>", escape = FALSE, format = "html") %>%
  kable_classic(full_width = F, html_font = "Cambria")

# Adding Baseline Regression predictions to data set and adding KPI's
listings_train <- listings_train %>% na.omit()

listings_train <-
  listings_train %>%
  mutate(Regression = "Baseline Regression",
         daily_price.predict = exp(predict(reg_train, listings_train)),
         daily_price.error = daily_price.predict - exp(daily_price),
         daily_price.abserror = abs(daily_price.predict - exp(daily_price)),
         daily_price.ape = (abs(daily_price.predict - exp(daily_price))) / daily_price.predict)%>%
  filter(!is.na(daily_price.abserror))

# Saving results in table 
results_train <- data.frame(MAE = mean(listings_train$daily_price.abserror, na.rm = T), MAPE =  mean(listings_train$daily_price.ape, na.rm = T))

results_train %>%
  kbl(caption = "<strong><center>Table 5: Training Results</strong></center>", escape = FALSE, format = "html") %>%
  kable_classic(full_width = F, html_font = "Cambria")

**Table 5: Training Results**
MAE	MAPE
24.64884	0.2188603

We are now ready for our first regression analysis and split the data into a test and training set. Our baseline regression has an adjusted R of 80%, meaning that our predicted values on average have an error of 20%. R is one measure of the accuracy of our model, and we will later also consider mean average prediction error (MAPE).

We’ll go into a detailed analysis and interpretation for R, MAE,and MAPE for our test set. Before we check the performance of our regression model on unseen data (test set), we first want to understand what variables contribute most to our accuracy.

set.seed(1000)

variable <- c("daily_price", "room_type", "near_ATTRAC", "number_of_reviews", "property_type", "bathrooms", "bedrooms", "review_scores_rating", "cancellation_policy", "host_identity_verified", "accommodates", "beds", "bed_type", "host_is_superhost", "cleaning_fee", "security_deposit", "extra_people", "availability_30", "distance","k_nearest_price", "zipcode")

# For loop to calculate change in regression model upon adding an additional variable
results <- vector()

for (i in sequence(length(variable),1,1)) {
 words <- variable[1:i]  
 
 reg_train <- 
    lm(daily_price ~ ., data = as.data.frame(listings_train) %>% 
                               dplyr::select(words))
  
results [[i]] <- summary(reg_train)$r.squared
}

results <- as.data.frame(cbind(variable, results)) %>% mutate(lagged_results = lag(results), change_r_sqrd = as.numeric(results) - as.numeric(lagged_results)) %>% select(variable, change_r_sqrd)

results <- results[order(results$change_r_sqrd),]

results %>%
  kbl(caption = "<strong><center>Table 6: Weight by Order on Accuracy</strong></center>", escape = FALSE, format = "html") %>%
  kable_classic(full_width = F, html_font = "Cambria")

How does each variable change our R² ? We created a loop that successively adds one variable after the other in the order as specified by the loop object. The leftmost column is the index of the order of the variable

The astute reader will realize that variables that are included earlier on will have a greater impact on R² since these are the first ones to add information to our regression. Hence, their contribution to R² will be greater.

However, we want to learn about the impact of k-nearest on the analysis, a variable that is usually added after the normal regression is run and we want to include the spatial process. So the question becomes, how good a predictor is the spatial process if added at the end? This is an interesting question because great predictors change the accuracy by a lot even if added at very last.

We see that k-nearest contributes 0.003 to our accuracy if added at the end, from which it could be reasonable to follow that there is no spatial process at all.

However, when adding a neighborhood effect with the zipcode as last variable to the regression after the k-nearest variable, we get a change of R² by 0.07, an effect more than 20 times as great than that of k-nearest.

This is to say that there is a very weak spatial process, but one that is not captured well (or at all) with k-nearest. This speaks again for the fact that our AirBnB’s might be far apart distance wise, or that nearby apartments can have hedonic features that are vastly different, hence accounting for the non-effecitvity of the spatial process.

Test Regression

# Adding nearest neighbours
coords.test.2 <-  st_coordinates(listings_test) 

neighborList.test.2 <- knn2nb(knearneigh(coords.test.2, 3))

spatialWeights.test.2 <- nb2listw(neighborList.test.2, style="W")

listings_test <- listings_test %>% mutate(k_nearest_price = lag.listw(spatialWeights.test.2, daily_price) / accommodates)

# Applying regression model to test set and adding KPI's
listings_test <-
  listings_test %>%
  mutate(Regression = "Baseline Regression",
         daily_price.predict = exp(predict(reg_train, listings_test)),
         daily_price.error = daily_price.predict - exp(daily_price),
         daily_price.abserror = abs(daily_price.predict - exp(daily_price)),
         daily_price.ape = (abs(daily_price.predict - exp(daily_price))) / daily_price.predict)%>%
  filter(daily_price < 3000000, !is.na(daily_price.abserror))


# Saving results in table 
results_test <- data.frame(MAE = mean(listings_test$daily_price.abserror, na.rm = T), MAPE =  mean(listings_test$daily_price.ape, na.rm = T))

results_test %>%
  kbl(caption = "<strong><center>Table 7: Test Results</strong></center>", escape = FALSE, format = "html") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Our test set has a MAE of 21$ and a MAPE of 19.5%. Given that we are missing the most essential predictors in our data set, these results are solid.

To understand where our model has weaknesses, we map the spatial distribution of erros.

# Visualizing spatial error
ggplot() +
  geom_sf(data = nhoods, fill = "white", colour = "black") +
  geom_sf(data = listings_test, aes(colour = q5(daily_price.error)),
          size = 1) +
  scale_colour_manual(values = palette6,
                      labels = qBr(listings_test, "daily_price.error"),
                      name = "Quintile \n Breaks")+
   labs(title = "Spatial Distribution of Regression Residuals") +
  theme_void()

Fig. 30

coords.test.3 <-  st_coordinates(listings_test) 

neighborList.test.3 <- knn2nb(knearneigh(coords.test.3, 3))

spatialWeights.test.3 <- nb2listw(neighborList.test.3, style="W")


moranTest <- moran.mc(listings_test$daily_price.error, 
                      spatialWeights.test.3, nsim = 999)

morans_plot <- ggplot(as.data.frame(moranTest$res[c(1:999)]), aes(x = moranTest$res[c(1:999)])) +
  geom_histogram(binwidth = 0.01) +
  geom_vline(aes(xintercept = moranTest$statistic), colour = airbnb_color, size = 1) +
  scale_x_continuous(limits = c(-1, 1)) +
  labs(
    title = "Observed and permuted Moran's I",
    subtitle = "Observed Moran's I in pink",
    x = "Moran's I",
    y = "Count"
  ) +
  theme_bw() + 
  theme(
    panel.border = element_rect(colour = "black", fill = NA, size = 0.5), # Adjust size here for a thinner border
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
    
  )

morans_plot

Fig. 31

We first see that visually, there is no spatial process in the residuals. This implies that our model captured a spatial process, if there was any in first place. We test for this hypothesis with Moran’s I.

The Moran’s I plot implies that there is no global spatial clustering in the error terms, just as we expected.

But how can we interpret the spatial distribution of errors? Firstly, we see that our model tends to undervalue daily house prices. This makes intuitive sense as we saw that very few our features really isolated expensive listings. As demonstrated above, the k nearest effect was also quite low, implying that there must have been internal factors to the houses that made them soar or decline in price.

Cross-Validation

# Specifying 100-fold CV as computational nuance
fitControl <- trainControl(method = "cv", number = 100)

set.seed(825)

# Running 100-fold CV
reg.cv <- 
  train(daily_price ~ ., data = st_drop_geometry(listings_train) %>% 
                               dplyr::select(daily_price,
                                             room_type,
                                             near_ATTRAC,
                                             k_nearest_price,
                                             number_of_reviews,
                                             property_type,
                                             bathrooms,
                                             bedrooms,
                                             review_scores_rating,
                                             cancellation_policy,
                                             host_identity_verified,
                                             accommodates,
                                             beds,
                                             bed_type,
                                             host_is_superhost,
                                             cleaning_fee,
                                             security_deposit,
                                             extra_people,
                                             availability_30,
                                             zipcode,
                                             neighbourhood,
                                             distance), 
     method = "lm", trControl = fitControl, na.action = na.pass)

reg.cv$resample[1:100,]

ggplot(reg.cv$resample) +
  aes(MAE) +
  geom_histogram(bins=15, fill = airbnb_color) +
  labs(title="Frequency Distribution of MAE for 100 folds CV",
       x="MAE",
       y="Frequency")  +
  theme_bw() +
    theme(panel.grid.minor = element_blank())

Fig. 29

We now test for generalizability of our model via 100-fold cross validation. The MAE peaks around 0.2, which was also our MAE for the test test. Additionally, the distribution is approximately normal with one outlier to the right. This implies that we must have caught some systematic pattern in the daily pricing data with our features, and that additionally hedonic data like the square feet are likely to drastically boost the quality of our model, which already is good enough for a beta test.

As the city of Amsterdam has data on sizes for all listings, we could integrate this feature into our model.

Conclusion

We believe that policy reinforcement is the number one way for Amsterdam to create a more equitable rental market for its citizens, and we also believe that as the first city to officially work with Airbnb to counter hyper-gentrification, Amsterdam can be a global leader in short-term rental policy. Our tool helps monitor the policies issued by the Amsterdam city council, and we also believe that those policies are the right ones to tackle the problem. We additionally offer a fair housing feature that would go one step ahead and could, for example, flag property owners that sell their listing that are priced more than 95% (or two standard deviations) above the mean. Whereas the use cases of our software solution are clear and effective, an additional advantage is that once we can dock to the city’s database, we can improve the accuracy and generalizability of many of our predictions. Furthermore, it is easy to add policies that the city council wants to see monitored in feature and could also be expanded to use cases other than short-term leasing. The app’s diverse features, coupled with its adaptable use cases, position it as an ideal solution for a pilot project aimed at addressing the challenges of overtourism and tourism-based gentrification exacerbated by the short-term rental market and platforms like Airbnb.

Presentation Link

https://www.youtube.com/watch?v=xCjvpAW5-3A

DutchNews.nl. 2022. “Airbnb-Style Rentals Harmful to Amsterdam,Says Tourism Chief.” https://www.dutchnews.nl/2022/10/airbnb-style-rentals-harmful-to-amsterdam-says-tourism-chief/.↩︎
Dictionary.com. n.d. “Definition of Overtourism.” https://www.dictionary.com/browse/overtourism.↩︎
Statista. 2023. “From Restrictions to Reimagination: Amsterdam’sVision for a New Tourist Economy.” https://oecdcogito.blog/2023/05/10/from-restrictions-to-reimagination-amsterdams-vision-for-a-new-tourist-economy/.↩︎
Michigan State University International Law Review. 2018.”Regulating a Largely Unregulated Market: What Amsterdam Has Doneto Control Airbnb Rentals.” https://www.msuilr.org/msuilr-legalforum-blogs/2018/11/19/regulating-a-largely-unregulated-market-what-amsterdam-has-done-to-control-airbnb-rentals.↩︎
Michigan State University International Law Review. 2018.”Regulating a Largely Unregulated Market: What Amsterdam Has Doneto Control Airbnb Rentals.” https://www.msuilr.org/msuilr-legalforum-blogs/2018/11/19/regulating-a-largely-unregulated-market-what-amsterdam-has-done-to-control-airbnb-rentals.↩︎
Radnović, Branislav et al. 2019. “Marketing Influence of TouristicAirbnb Application on Vacation Rental Market in London, Paris, Berlin,Rome and Amsterdam.” Tourism and Management Studies. https://doi.org/10.31410/tmt.2019.571.↩︎
6.Airbnb. 2023. “Dutch Regulations for Airbnb Hosts.” https://www.airbnb.com/help/article/860.↩︎
Amsterdam, City of. 2023. “Applying for a Holiday Rental Permit inAmsterdam.” https://www.amsterdam.nl/en/housing/holiday-rentals/applying-permit/.↩︎
Thompson, Derek. John. 2020. “Why Do American Houses Have So Many Bathrooms?” The Atlantic, January 1. https://www.theatlantic.com/ideas/archive/2020/01/why-do-american-houses-have-so-many-bathrooms/605338/↩︎
Molero, Molero. 2023. “Best Cities for Transportation: Public Transit, EVs, Cycling Networks.” Bloomberg, December 4. https://www.bloomberg.com/news/articles/2023-12-04/best-cities-for-transportation-public-transit-evs-cycling-networks.↩︎

FairBnB

Nohman Akharti & Jarred Randall

2023-12-15

Backround

Amsterdam Overtourism

Overtourism Stats

Amsterdam Overtourism

Average Rent Increase

Amsterdam Airbnb Stats

Amsterdam Airbnb Listings

Airbnb Listing vs Searches

Short-Term Rental Regulations

Fairbnb

Fairbnb Application Interface

Fairbnb Startup Screen

Fairbnb Mapping Feature

Tracking Rental Days

Fairbnb Listing Tracking Feature

Fairbnb Listings at Risk

Fairbnb Listings at or over Max

Tracking Multiple Rental Properties

Introducing Fair-Market Pricing

Data Wrangling

Exploratory Data Analysis

Independent Variables (Predictors)

Hedonic Variables

Numeric Hedonic Predictors

Non-Numeric Hedonic Predictors

Spatial process

Demographics

Crime & Distance to Major Roads

Crime

Distance to Major Roads

Migrants & Marital Status

Migrant Population

Married Population

Centrality

Infrastructure

Preparation for Regression Model

Methods & Results

Training Regression

Test Regression

Cross-Validation

Conclusion

Presentation Link