1 Executive Summary

This report investigates the tenancy nature of Airbnbs in Sydney, and their consequential effects on the rental market. It was discovered that Airbnbs are primarily used for tourists, and this is having detrimental effects on coastal suburbs, where homeowners are being driven out and replaced by short-term tourists, thus affecting the rental market.


2 Full Report

The data:

# Read in the data
## Option 2: reduced Airbnb data (less variables)
listings = read.csv("http://www.maths.usyd.edu.au/u/UG/JM/DATA1001/r/current/projects/2020data/listingsSMALL.csv")

However, this dataset has too many variables, which makes the analysis more convoluted. The listingsTINY file provided on Canvas did not contain the variables I wished to analyse, so a new file was created.

# Read in the data
library(readxl)
listings = read_excel("/Users/emilyralph/Documents/R Studio/listings.xlsx")
listings
## # A tibble: 40,405 x 14
##    zipcode location propertytype roomtype accommodates bathrooms bedrooms  beds
##      <dbl> <chr>    <chr>        <chr>           <dbl>     <dbl>    <dbl> <dbl>
##  1    2011 Potts P… Apartment    Private…            1      NA          1     1
##  2    2093 Balgowl… House        Entire …            6       3          3     3
##  3    2010 Darling… Apartment    Private…            2       1          1     1
##  4    2010 Darling… Loft         Entire …            2       1          1     1
##  5    2026 Bondi B… House        Entire …           11       3          5     7
##  6    2026 Bondi B… Apartment    Entire …            2       1          1     1
##  7    2088 Mosman,… House        Entire …            4       1          1     1
##  8    2022 Bondi J… House        Entire …            8       2          4     5
##  9    2015 Alexand… House        Entire …            7       2.5        3     7
## 10    2026 Bondi B… Apartment    Entire …            4       1          2     2
## # … with 40,395 more rows, and 6 more variables: price <dbl>,
## #   weekly_price <dbl>, monthly_price <dbl>, minimum_nights <dbl>,
## #   maximum_nights <dbl>, availability365 <dbl>

2.1 Initial Data Analysis (IDA)

2.1.1 Source of data

The data was sourced from InsideAirbnb, which is an independently-run tool that allows us to explore how Airbnb is used globally. Data regarding Airbnbs in Sydney was gathered on 16/03/2020, through the provision of listing details by hosts. Being only a month old, this raw data will provide a great, unbiased insight into the use of Airbnbs in Sydney.

2.1.2 Quick snapshot

## size of data
dim(listings)
## [1] 40405    14
## name of variables
names(listings)
##  [1] "zipcode"         "location"        "propertytype"    "roomtype"       
##  [5] "accommodates"    "bathrooms"       "bedrooms"        "beds"           
##  [9] "price"           "weekly_price"    "monthly_price"   "minimum_nights" 
## [13] "maximum_nights"  "availability365"
## R's classification of variables
str(listings)
## Classes 'tbl_df', 'tbl' and 'data.frame':    40405 obs. of  14 variables:
##  $ zipcode        : num  2011 2093 2010 2010 2026 ...
##  $ location       : chr  "Potts Point, Australia" "Balgowlah, Australia" "Darlinghurst, Australia" "Darlinghurst, Australia" ...
##  $ propertytype   : chr  "Apartment" "House" "Apartment" "Loft" ...
##  $ roomtype       : chr  "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
##  $ accommodates   : num  1 6 2 2 11 2 4 8 7 4 ...
##  $ bathrooms      : num  NA 3 1 1 3 1 1 2 2.5 1 ...
##  $ bedrooms       : num  1 3 1 1 5 1 1 4 3 2 ...
##  $ beds           : num  1 3 1 1 7 1 1 5 7 2 ...
##  $ price          : num  65 470 100 129 600 ...
##  $ weekly_price   : num  NA 3000 800 791 4725 ...
##  $ monthly_price  : num  NA NA 3000 2793 13000 ...
##  $ minimum_nights : num  2 5 2 3 14 4 2 7 6 4 ...
##  $ maximum_nights : num  180 22 7 365 365 90 90 28 21 90 ...
##  $ availability365: num  352 198 309 0 286 87 212 365 0 353 ...

R incorrectly classified the variables “location”, “propertytype” and “roomtype” as these are factors, not characters. Variables “accommodates”, bedrooms“,”bathrooms" and “beds” are also incorrect, as these are factors, not numerical.

## reclassification of the variables mentioned above
listings$location = as.factor(listings$location)
listings$propertytype = as.factor(listings$propertytype)
listings$roomtype = as.factor(listings$roomtype)
listings$accommodates = as.factor(listings$accommodates)
listings$bedrooms = as.factor(listings$bedrooms)
listings$bathrooms = as.factor(listings$bathrooms)
listings$beds = as.factor(listings$beds)

New classifications:

str(listings)
## Classes 'tbl_df', 'tbl' and 'data.frame':    40405 obs. of  14 variables:
##  $ zipcode        : num  2011 2093 2010 2010 2026 ...
##  $ location       : Factor w/ 855 levels "–†–µ–¥—Ñ–µ—Ä–Ω, Australia",..: 637 41 249 249 101 101 533 109 10 101 ...
##  $ propertytype   : Factor w/ 43 levels "Aparthotel","Apartment",..: 2 26 2 29 26 2 26 26 26 2 ...
##  $ roomtype       : Factor w/ 4 levels "Entire home/apt",..: 3 1 3 1 1 1 1 1 1 1 ...
##  $ accommodates   : Factor w/ 16 levels "1","2","3","4",..: 1 6 2 2 11 2 4 8 7 4 ...
##  $ bathrooms      : Factor w/ 20 levels "0","0.5","1",..: NA 7 3 3 7 3 3 5 6 3 ...
##  $ bedrooms       : Factor w/ 14 levels "0","1","2","3",..: 2 4 2 2 6 2 2 5 4 3 ...
##  $ beds           : Factor w/ 23 levels "0","1","2","3",..: 2 4 2 2 8 2 2 6 8 3 ...
##  $ price          : num  65 470 100 129 600 ...
##  $ weekly_price   : num  NA 3000 800 791 4725 ...
##  $ monthly_price  : num  NA NA 3000 2793 13000 ...
##  $ minimum_nights : num  2 5 2 3 14 4 2 7 6 4 ...
##  $ maximum_nights : num  180 22 7 365 365 90 90 28 21 90 ...
##  $ availability365: num  352 198 309 0 286 87 212 365 0 353 ...


This dataset has limitations – we can only analyse based on information that the property host has provided. InsideAirbnb doesn’t include the data coming from visitors, such as demographics and average number of nights stayed, which would’ve been beneficial for analysis.


2.2 Exploring Data / Research Question

The question I’m exploring is: are Airbnbs in Sydney predominantly used for short-term tourists or for long-term tenants?

library(ggplot2)
maximum = listings$maximum_nights
minimum = listings$minimum_nights
regression=lm(minimum~maximum)
p = ggplot(listings, aes(maximum, minimum))
p + geom_point((aes(colour = factor(roomtype)))) +ggtitle("Factors affecting price") + geom_smooth(method = "lm", color = "yellow")
## `geom_smooth()` using formula 'y ~ x'


If you were to rent your house short-term to tourists, you would have a smaller maximum number of nights required for a booking, and long-term hosts would have a larger minimum number of nights. This result can be seen in this graph. In the bottom left-hand corner, there is visible clustering of thousands of data points, thus demonstrating a high level of properties that are capping their maximum night stays and reducing their minimum nights as to encourage more tourism. In the top right-hand corner, we have a couple of properties which are hosted as long-term properties due to their high minimum nights. However, there are some hosts who are clearly indifferent as to who they target – the bottom-right corner cluster demonstrates this. Unfortunately, this means that the yellow regression line isn’t a good example of a linear model. If hosts had to choose to be a short-term or long-term renter, then we would see a relationship.

2.3 Domain knowledge and research

From their research on the effect of Airbnb listings in Sydney, Tooran Alizadeh, Reza Farid & Somwrita Sarkar (2018) concluded that Airbnb only makes up 3.38% of the total rental market in Sydney, so the overall impact is expected to be minimal. However, this doesn’t mean that the rental market in all suburbs of Sydney will not be heavily impacted, because the impact is not evenly distributed geographically and socio-economically. Airbnb may have a greater impact in areas that are strategically located for tourism, such as beachside suburbs, because there is a danger that local residents will move out so that tourists can “move” in, thus affecting the rental housing market for long-term residents.


2.3.1 References

Tooran Alizadeh, Reza Farid & Somwrita Sarkar (2018) Towards Understanding the Socio-Economic Patterns of Sharing Economy in Australia: An Investigation of Airbnb Listings in Sydney and Melbourne Metropolitan Regions, Urban Policy and Research, 36:4, 445-463, DOI: 10.1080/08111146.2018.1460269