Email : parulverma0997@gmail.com College : BITS Pilani K.K. Birla Goa campus
Tourism and hospitality play an important role in the global economy. With the advent of technology, this industry has seen its fair share of digitisation,making travelling and tourism easier. AirBnb is one such successful venture. Airbnb is an American company which operates an online marketplace and hospitality service for people to lease or rent short-term lodging including holiday cottages, apartments, homestays, hostel beds, or hotel rooms; to participate in or facilitate experiences related to tourism. It is a broker which receives percentage service fees in conjunction with every booking. The company has two main client base to cater to - the hosts and the guests. Hosts are the people who list accommodation they offero on the website while guests are the ones who book their preferred accommodation. The company tries to make prospective host-guest interaction easy and convenient. Airbnb can be accessed via either its websites or mobile apps for iOS, Apple Watch, and Android. Registration and account creation is free. On each booking, the company charges guests a 6-12% guest services fee and charges hosts a 3-5% host service fee. Hosts can also offer “experiences”, such as excursions, to guests for an additional charge, of which Airbnb takes 20% as a commission. The company has over 4 million lodging listings in 65,000 cities and 191 countries and has facilitated over 260 million check-ins. The host decides upon a rent/fee which the guest has to pay. The entire transaction is done after discussion and reviews. However, we can see that there will be a lot of diversity in the type of service/ lodgings that different hosts offer. Thus, an analysis of the rent with respect to the type of rooms, neighbourhood, host-mandated minimum period of stay, number of bedrooms and number of people accommodated becomes important. The overall satisfaction of the customer after his/her/their stay helps to gauge whether the customer found the stay and the price charged at par with the services offered. In this project, we try to look at both the client bases here and figure out whether the proces charges and consumer satisfaction with the services are in tandem or not?
Our study is about what factors may have an influence on the price charged by a host and whether those views are corroborated by the guests in their customer satisfaction ratings. The number of people giving reviews is an important parameter as it is very subjective and must be taken into account while running group analyses of consumer satisfaction. THe various rental accommodation are sitauted in different neighbourhoods of Wellington, New Zealand.
WELLINGTON, NZ - Wellington, the capital of New Zealand and second most populous area of NZ with about 412,500 residents.Wellington is a popular tourist destination, thus making AirBnb’s listings there very profitable. There are approximately 540,000 international visitors each year. Suitable expansion of AirBnb can thus help turn over major profits in the city. In this study, we mainly focus on what the host-charged rent depends upon and how overall satisfaction ratings vary in terms of the rent and what can be done to improve business.
AirBnb has its own website called “Inside Airbnb - Adding data to the debate” which contains a multitude of datasets. he dataset used in the study has been sourced from “http://tomslee.net/airbnb-data-collection-get-the-data”. THe dataset contains 858 row entries and 14 column entries, with each column containing a variable and its values under different conditions. The dataset represents a single “survey” or “scrape” of the Airbnb web site for that city. It has been collected from the public Airbnb web site without logging in.
The key to the dataset is as follows -
room_id: A unique number identifying an Airbnb listing. host_id: A unique number identifying an Airbnb host. room_type: One of “Entire home/apt”, “Private room”, or “Shared room” borough: A subregion of the city or search area for which the survey is carried out. neighborhood: A subregion of the city or search area for which the survey is carried out. For cities that have both, a neighbourhood is smaller than a borough. reviews: The number of reviews that a listing has received. Airbnb has said that 70% of visits end up with a review, so the number of reviews can be used to estimate the number of visits. While not much can be gained from it individually, overall it can give a useful metric of traffic. overall_satisfaction: The average rating (out of five) that the listing has received from those visitors who left a review. accommodates: The number of guests a listing can accommodate. bedrooms: The number of bedrooms a listing offers. price: The price (in $US) for a night stay. minstay: The minimum stay for a visit, as posted by the host. latitude and longitude: The latitude and longitude of the listing as posted on the Airbnb site (may be off by a few hundred metres.) last_modified: the date and time that the values were read from the Airbnb web site.
setwd("C:/Users/Parul Verma/Desktop/Data Analytics Internship/Capstone project/wellington/s3_files/wellington")
Data.df <-read.csv(paste ("tomslee_airbnb_wellington_0536_2016-08-19.csv", sep=""))
The following code chunks help to provide a description of the dataset that we have. All the graphs and tables are self-explanatory and help to get a better insight into the dataset that we have.
Descriptive statistics - Using both summary and describe commands -
summary(Data.df)
## room_id host_id room_type
## Min. : 9260 Min. : 2576 Entire home/apt:341
## 1st Qu.: 7001744 1st Qu.:11083016 Private room :503
## Median :10540846 Median :30913924 Shared room : 14
## Mean : 9467715 Mean :33466451
## 3rd Qu.:12661462 3rd Qu.:52886548
## Max. :14614740 Max. :90697138
##
## borough neighborhood reviews
## Mode:logical Willis Street-Cambridge Terrace:116 Min. : 0.00
## NA's:858 Mt Victoria West : 79 1st Qu.: 1.00
## Mt Cook-Wallace Street : 48 Median : 5.00
## Lambton : 41 Mean : 15.07
## Melrose-Houghton Bay-Southgate : 35 3rd Qu.: 19.00
## Thorndon-Tinakori Road : 29 Max. :226.00
## (Other) :510
## overall_satisfaction accommodates bedrooms price
## Min. :2.500 Min. : 1.000 Min. :0.000 Min. : 14.00
## 1st Qu.:4.500 1st Qu.: 2.000 1st Qu.:1.000 1st Qu.: 45.00
## Median :5.000 Median : 2.000 Median :1.000 Median : 68.00
## Mean :4.742 Mean : 2.951 Mean :1.422 Mean : 90.52
## 3rd Qu.:5.000 3rd Qu.: 4.000 3rd Qu.:1.750 3rd Qu.:112.00
## Max. :5.000 Max. :15.000 Max. :7.000 Max. :750.00
## NA's :352
## minstay latitude longitude
## Min. : 1.000 Min. :-41.35 Min. :174.7
## 1st Qu.: 1.000 1st Qu.:-41.31 1st Qu.:174.8
## Median : 1.000 Median :-41.30 Median :174.8
## Mean : 2.009 Mean :-41.29 Mean :174.8
## 3rd Qu.: 2.000 3rd Qu.:-41.28 3rd Qu.:174.8
## Max. :21.000 Max. :-41.15 Max. :174.8
## NA's :38
## last_modified
## 2016-08-19 22:43:50.478953: 1
## 2016-08-19 22:43:55.479945: 1
## 2016-08-19 22:43:58.875648: 1
## 2016-08-19 22:44:02.154513: 1
## 2016-08-19 22:44:07.612797: 1
## 2016-08-19 22:44:09.719711: 1
## (Other) :852
library(psych)
describe(Data.df)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## vars n mean sd median
## room_id 1 858 9467714.98 3903455.63 10540846.50
## host_id 2 858 33466451.07 24191934.01 30913924.00
## room_type* 3 858 1.62 0.52 2.00
## borough* 4 0 NaN NA NA
## neighborhood* 5 858 41.77 21.17 43.50
## reviews 6 858 15.07 26.22 5.00
## overall_satisfaction 7 506 4.74 0.34 5.00
## accommodates 8 858 2.95 1.83 2.00
## bedrooms 9 858 1.42 0.91 1.00
## price 10 858 90.52 74.43 68.00
## minstay 11 820 2.01 2.33 1.00
## latitude 12 858 -41.29 0.03 -41.30
## longitude 13 858 174.78 0.02 174.78
## last_modified* 14 858 429.50 247.83 429.50
## trimmed mad min max
## room_id 9854106.30 3730980.69 9260.00 14614740.00
## host_id 31939541.84 30841789.47 2576.00 90697138.00
## room_type* 1.63 0.00 1.00 3.00
## borough* NaN NA Inf -Inf
## neighborhood* 42.94 24.46 1.00 72.00
## reviews 9.09 7.41 0.00 226.00
## overall_satisfaction 4.78 0.00 2.50 5.00
## accommodates 2.63 0.00 1.00 15.00
## bedrooms 1.22 0.00 0.00 7.00
## price 77.52 43.00 14.00 750.00
## minstay 1.50 0.00 1.00 21.00
## latitude -41.30 0.02 -41.35 -41.15
## longitude 174.78 0.01 174.69 174.85
## last_modified* 429.50 318.02 1.00 858.00
## range skew kurtosis se
## room_id 14605480.00 -0.74 -0.47 133261.78
## host_id 90694562.00 0.37 -0.92 825899.01
## room_type* 2.00 -0.13 -1.20 0.02
## borough* -Inf NA NA NA
## neighborhood* 71.00 -0.27 -0.97 0.72
## reviews 226.00 3.46 16.07 0.90
## overall_satisfaction 2.50 -2.23 10.35 0.02
## accommodates 14.00 2.18 6.65 0.06
## bedrooms 7.00 2.29 6.29 0.03
## price 736.00 3.06 15.46 2.54
## minstay 20.00 5.22 34.09 0.08
## latitude 0.19 1.34 2.73 0.00
## longitude 0.16 -0.06 1.81 0.00
## last_modified* 857.00 0.00 -1.20 8.46
ONE-WAY CONTINGENCY TABLES -
ROOM TYPE
table1 <- with(Data.df, table(room_type))
table1
## room_type
## Entire home/apt Private room Shared room
## 341 503 14
OVERALL SATISFACTION
table2 <- with(Data.df, table(overall_satisfaction))
table2
## overall_satisfaction
## 2.5 3.5 4 4.5 5
## 3 4 12 210 277
NUMBER ACCOMMODATED
table3 <- with(Data.df, table(accommodates))
table3
## accommodates
## 1 2 3 4 5 6 7 8 9 10 12 14 15
## 64 483 60 125 44 37 15 20 2 2 4 1 1
NUMBER OF BEDROOMS AVAILABLE
table4 <- with(Data.df, table(bedrooms))
table4
## bedrooms
## 0 1 2 3 4 5 6 7
## 13 630 108 69 29 5 2 2
MINIMUM STAY
table5 <- with(Data.df, table(minstay))
table5
## minstay
## 1 2 3 4 5 6 7 10 14 20 21
## 453 245 52 10 19 4 22 4 5 1 5
table6 <- with(Data.df, table(neighborhood))
table6
## neighborhood
## Adelaide Aro Street-Nairn Street
## 4 28
## Awarua Berhampore East
## 11 7
## Berhampore West Brooklyn
## 8 23
## Brooklyn South Churton Park North
## 5 9
## Churton Park South Crofton Downs
## 2 3
## Greenacres Grenada Village
## 1 1
## Happy Valley-Owhiro Bay Hataitai North
## 3 15
## Horokiwi Island Bay East
## 2 12
## Island Bay West Johnsonville Central
## 15 7
## Johnsonville East Johnsonville North
## 1 3
## Kaiwharawhara Karaka Bay-Worser Bay
## 2 8
## Karori East Karori North
## 9 6
## Karori Park Karori South
## 8 12
## Kelburn Khandallah Park-Broadmeadows
## 17 2
## Kilbirnie East Kilbirnie West-Hataitai South
## 4 22
## Kingston-Mornington Lambton
## 3 41
## Linden Lyall Bay-Airport-Moa Point
## 3 15
## Makara-Ohariu Maupuia
## 10 2
## Melrose-Houghton Bay-Southgate Miramar
## 35 6
## Miramar North Miramar South
## 6 6
## Miramar West Mitchelltown
## 1 3
## Mt Cook-Wallace Street Mt Victoria West
## 48 79
## Newlands East Newlands North
## 3 3
## Newlands South Newtown East
## 2 24
## Newtown West Ngaio South
## 13 6
## Ngauranga East Northland
## 2 13
## Northland North Oriental Bay
## 4 9
## Paparangi Paparangi West
## 1 1
## Rangoon Heights Raroa
## 12 10
## Roseneath Seatoun
## 18 13
## Seatoun Tunnel West Strathmore Park
## 3 11
## Tawa Central Tawa South
## 2 4
## Te Kainga Thorndon-Tinakori Road
## 12 29
## Vogeltown Vogeltown West
## 4 5
## Wadestown Willis Street-Cambridge Terrace
## 17 116
## Wilton Woodridge
## 12 1
TWO-WAY CONTINGENCY TABLES -
Table1 <- xtabs(~ room_type+overall_satisfaction, data = Data.df)
Table1
## overall_satisfaction
## room_type 2.5 3.5 4 4.5 5
## Entire home/apt 1 1 3 77 107
## Private room 2 3 9 125 170
## Shared room 0 0 0 8 0
Table2 <- xtabs(~ room_type+price, data = Data.df)
Table2
## price
## room_type 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
## Entire home/apt 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## Private room 1 4 2 0 3 1 2 2 4 1 1 17 4 1 5 22 2 3 4
## Shared room 0 0 0 2 0 0 0 1 2 0 0 1 0 0 0 3 0 0 0
## price
## room_type 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53
## Entire home/apt 1 0 0 1 1 0 0 1 0 3 0 3 2 0 3 1 0 1 4
## Private room 20 3 10 26 5 1 6 27 4 5 6 28 3 4 23 7 1 4 24
## Shared room 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
## price
## room_type 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 70 71 72 73
## Entire home/apt 0 1 1 3 0 2 7 0 1 0 2 2 1 2 4 0 8 2 0
## Private room 1 0 3 14 3 2 28 1 1 2 10 3 5 8 16 1 11 1 4
## Shared room 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## price
## room_type 74 75 76 78 79 80 81 82 84 87 88 90 91 94 97 98 99 101
## Entire home/apt 6 8 0 6 1 0 1 8 0 1 1 16 3 8 3 5 1 3
## Private room 9 17 1 1 1 1 1 5 2 2 0 6 1 8 1 3 0 1
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## price
## room_type 103 104 105 108 109 110 111 112 113 115 116 119 121 123
## Entire home/apt 1 0 7 2 2 1 2 22 1 1 2 4 3 1
## Private room 1 1 5 0 1 0 0 6 1 0 1 5 0 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## price
## room_type 124 125 126 127 128 131 133 135 138 139 141 142 143 144
## Entire home/apt 3 1 2 2 4 7 0 13 1 2 1 5 1 1
## Private room 1 0 0 1 0 1 1 3 0 0 0 0 0 1
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## price
## room_type 146 148 149 150 151 158 165 166 169 172 178 179 180 184
## Entire home/apt 6 2 7 8 1 6 2 1 2 6 1 1 4 1
## Private room 0 0 0 2 0 0 1 0 0 1 0 0 0 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## price
## room_type 185 187 191 195 202 205 206 209 210 217 218 221 225 226
## Entire home/apt 1 15 1 1 0 1 1 0 2 1 1 1 10 2
## Private room 0 3 0 1 2 0 0 1 0 0 0 1 1 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## price
## room_type 229 236 247 254 262 263 270 274 281 296 297 304 337 338
## Entire home/apt 1 1 1 1 2 8 1 2 1 1 2 1 1 3
## Private room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## price
## room_type 375 450 487 521 599 750
## Entire home/apt 3 2 1 1 2 0
## Private room 0 0 0 0 0 1
## Shared room 0 0 0 0 0 0
Table3 <- xtabs(~ minstay+overall_satisfaction, data = Data.df)
Table3
## overall_satisfaction
## minstay 2.5 3.5 4 4.5 5
## 1 3 0 6 111 143
## 2 0 4 6 71 94
## 3 0 0 0 12 14
## 4 0 0 0 2 2
## 5 0 0 0 2 2
## 7 0 0 0 3 1
## 10 0 0 0 1 0
Table4 <- xtabs(~ reviews+overall_satisfaction, data = Data.df)
Table4
## overall_satisfaction
## reviews 2.5 3.5 4 4.5 5
## 3 1 1 0 24 16
## 4 2 0 0 12 20
## 5 0 0 4 12 19
## 6 0 0 0 7 19
## 7 0 0 0 9 5
## 8 0 0 0 12 13
## 9 0 1 1 4 9
## 10 0 1 2 7 9
## 11 0 0 0 5 5
## 12 0 0 0 4 7
## 13 0 0 0 5 10
## 14 0 0 0 5 4
## 15 0 0 0 5 3
## 16 0 0 0 3 9
## 17 0 1 2 2 3
## 18 0 0 0 2 3
## 19 0 0 0 6 6
## 20 0 0 0 2 7
## 21 0 0 0 4 6
## 22 0 0 0 2 2
## 23 0 0 0 4 5
## 24 0 0 1 2 3
## 25 0 0 0 4 2
## 26 0 0 1 3 3
## 27 0 0 0 1 4
## 28 0 0 1 1 8
## 29 0 0 0 2 4
## 30 0 0 0 1 2
## 31 0 0 0 3 1
## 32 0 0 0 1 2
## 33 0 0 0 2 4
## 34 0 0 0 0 5
## 35 0 0 0 4 3
## 36 0 0 0 1 3
## 37 0 0 0 0 2
## 38 0 0 0 1 3
## 39 0 0 0 3 0
## 40 0 0 0 1 1
## 41 0 0 0 2 1
## 42 0 0 0 2 1
## 43 0 0 0 2 1
## 44 0 0 0 1 0
## 45 0 0 0 0 3
## 46 0 0 0 3 2
## 47 0 0 0 0 1
## 48 0 0 0 2 1
## 49 0 0 0 0 2
## 50 0 0 0 2 1
## 51 0 0 0 1 2
## 53 0 0 0 0 1
## 54 0 0 0 1 2
## 56 0 0 0 0 1
## 57 0 0 0 2 0
## 58 0 0 0 2 2
## 60 0 0 0 2 1
## 61 0 0 0 0 1
## 62 0 0 0 1 0
## 63 0 0 0 1 1
## 64 0 0 0 1 0
## 67 0 0 0 0 1
## 70 0 0 0 0 1
## 72 0 0 0 4 0
## 73 0 0 0 1 1
## 75 0 0 0 0 1
## 76 0 0 0 0 2
## 77 0 0 0 0 1
## 79 0 0 0 1 1
## 82 0 0 0 1 1
## 85 0 0 0 0 1
## 86 0 0 0 1 2
## 89 0 0 0 1 0
## 93 0 0 0 0 1
## 94 0 0 0 1 0
## 97 0 0 0 1 1
## 98 0 0 0 1 0
## 103 0 0 0 1 1
## 115 0 0 0 0 1
## 116 0 0 0 0 1
## 122 0 0 0 1 0
## 124 0 0 0 0 1
## 126 0 0 0 0 1
## 129 0 0 0 1 0
## 141 0 0 0 0 1
## 144 0 0 0 1 0
## 145 0 0 0 0 1
## 160 0 0 0 0 1
## 174 0 0 0 1 0
## 196 0 0 0 0 1
## 201 0 0 0 1 0
## 226 0 0 0 1 0
A THREE-WAY CONTINGENCY TABLE - TO check how much overall satisfaction depend upon the number of people accommodated and the number of bedrooms available
Table5 <- xtabs(~ accommodates+bedrooms+overall_satisfaction, data = Data.df)
Table5
## , , overall_satisfaction = 2.5
##
## bedrooms
## accommodates 0 1 2 3 4 6
## 1 0 0 0 0 0 0
## 2 0 1 0 0 0 0
## 3 0 1 0 0 0 0
## 4 0 0 1 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## 7 0 0 0 0 0 0
## 8 0 0 0 0 0 0
## 10 0 0 0 0 0 0
## 12 0 0 0 0 0 0
## 15 0 0 0 0 0 0
##
## , , overall_satisfaction = 3.5
##
## bedrooms
## accommodates 0 1 2 3 4 6
## 1 0 0 0 0 0 0
## 2 0 4 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## 7 0 0 0 0 0 0
## 8 0 0 0 0 0 0
## 10 0 0 0 0 0 0
## 12 0 0 0 0 0 0
## 15 0 0 0 0 0 0
##
## , , overall_satisfaction = 4
##
## bedrooms
## accommodates 0 1 2 3 4 6
## 1 0 0 0 0 0 0
## 2 0 8 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 1 1 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 2 0 0
## 7 0 0 0 0 0 0
## 8 0 0 0 0 0 0
## 10 0 0 0 0 0 0
## 12 0 0 0 0 0 0
## 15 0 0 0 0 0 0
##
## , , overall_satisfaction = 4.5
##
## bedrooms
## accommodates 0 1 2 3 4 6
## 1 0 17 0 0 0 0
## 2 2 128 0 0 0 0
## 3 0 10 5 0 0 0
## 4 1 8 18 1 0 0
## 5 0 1 3 2 0 0
## 6 0 0 1 1 1 0
## 7 0 0 1 3 3 0
## 8 0 0 0 0 1 0
## 10 0 0 0 0 0 0
## 12 0 1 0 0 1 0
## 15 0 0 0 0 0 1
##
## , , overall_satisfaction = 5
##
## bedrooms
## accommodates 0 1 2 3 4 6
## 1 0 10 0 0 0 0
## 2 2 166 0 0 0 0
## 3 0 14 2 0 0 0
## 4 0 19 22 2 0 0
## 5 0 1 7 5 0 0
## 6 0 3 5 7 1 0
## 7 0 0 0 1 1 0
## 8 0 0 0 3 5 0
## 10 0 0 0 0 1 0
## 12 0 0 0 0 0 0
## 15 0 0 0 0 0 0
BOXPLOTS -
boxplot(Data.df$accommodates, xlab = "Number of people accommodated", main = "Accommodation", horizontal = TRUE)
boxplot(Data.df$reviews, xlab = "Number of reviews", main = "Number of People who give reviews", horizontal = TRUE)
boxplot(Data.df$bedrooms, xlab = "Number of bedrooms", main = "Number of bedrooms usually rented out", horizontal = TRUE)
boxplot(Data.df$minstay, xlab = "Number of days", main = "Period of stay", horizontal = TRUE)
boxplot(Data.df$price, xlab = "Price", main = "Price", horizontal = TRUE)
boxplot(Data.df$overall_satisfaction, xlab = "Rating", main = "Rating giving overall satisfaction", horizontal = TRUE)
HISTOGRAMS -
hist(Data.df$price, xlab = "Price", ylab = "Count of people charging different prices", main = "Price distribution")
hist(Data.df$overall_satisfaction, xlab = "Rating ", ylab = "People who gave the rating", main = "Overall Satisfaction level")
hist(Data.df$accommodates, xlab = "Number that renters usually accommodate", ylab = "Renters", main = "Count of accommodation")
SCATTER PLOTS -
plot(Data.df$price,Data.df$minstay, xlab = "Price", ylab = "Minimum Stay", main = "Relation between price and minimum stay")
CORRELATION MATRIX
library(corrplot)
## corrplot 0.84 loaded
corrplot(corr=cor(Data.df[ , c(6:11)], use = "complete.obs"), method = "ellipse" )
CORRGRAM
library(corrgram)
corrgram(Data.df, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Corrgram")
SCATTERPLOT MATRIX
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(formula = ~ reviews + overall_satisfaction + accommodates + bedrooms + price + minstay, cex = 0.6, data = Data.df, diagonal = "histogram")
HYPOTHESIS TESTING FOR THE HOST SIDE -
REGRESSION MODEL -
Here, we are testing a very simple model using regression. The proposed model is
Price = ??0 + ??1.accommodates + ??2.bedrooms + ??3.minstay + ??.
model1 <- lm(price ~ accommodates + bedrooms + minstay, data = Data.df)
summary(model1)
##
## Call:
## lm(formula = price ~ accommodates + bedrooms + minstay, data = Data.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -166.70 -28.79 -8.79 14.35 684.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.1285 4.0137 2.773 0.00569 **
## accommodates 13.3417 1.9558 6.822 1.75e-11 ***
## bedrooms 29.3920 4.0012 7.346 4.96e-13 ***
## minstay -1.4104 0.8775 -1.607 0.10840
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56.74 on 816 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.4221, Adjusted R-squared: 0.4199
## F-statistic: 198.6 on 3 and 816 DF, p-value: < 2.2e-16
We regressed the Price charged by a host on the basic kinds of facilities that he/she provides - number of people accommodated, number of bedrooms and minimum period of stay required. There’s empirical support for the fact that the rent decided upon by the host strongly depends on these factors.
HYPOTHESIS TESTING FOR THE GUEST SIDE -
In this section, we try to find out what, if any, is the difference between the level of comfort enjoyed by people who rated their overall satisfaction 4.5 or 5. This will help the company and other hosts to figure out what ratings indicate that a significant level of improvement is necessary and where. This is extremely important as it will help the hosts and ultimately, AirBnb improve their quality of service offered. In this part, we mainly focus on whether people who rate 4.5 or 5 experience a significant level of difference in terms of price.
Creating a subset from the main dataset of the people who gave overall satisfaction rating 4.5 and 5 -
Data1.df <- subset(Data.df, overall_satisfaction == 4.5 | overall_satisfaction == 5.0)
Now that we have a variable (overall-satisfaction with 2 grouping levels),
T-TEST -
Hyptothesis : There’s a significant difference between people who rate satisfaction as 5.0 and those who rate it as 4.5 in terms of price.
Null Hypothesis : There’s no significant difference between people who rate satisfaction as 5.0 and those who rate it as 4.5 in terms of price.
t.test(price ~ overall_satisfaction, data=Data1.df)
##
## Welch Two Sample t-test
##
## data: price by overall_satisfaction
## t = -1.8554, df = 477.34, p-value = 0.06415
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -20.6776332 0.5926409
## sample estimates:
## mean in group 4.5 mean in group 5
## 79.25714 89.29964
Since p-value > 0.05, this that we cannot reject our null hypothesis. Thus, we can say that people who rate satisfaction as 4.5 or 5.0 experience almost the same level of comfort and hospitality.
In the next part of our analysis, we’re going to use the one-way contingency table Table 6, to find out what neighbourhoods have the maximum listing.
table6
## neighborhood
## Adelaide Aro Street-Nairn Street
## 4 28
## Awarua Berhampore East
## 11 7
## Berhampore West Brooklyn
## 8 23
## Brooklyn South Churton Park North
## 5 9
## Churton Park South Crofton Downs
## 2 3
## Greenacres Grenada Village
## 1 1
## Happy Valley-Owhiro Bay Hataitai North
## 3 15
## Horokiwi Island Bay East
## 2 12
## Island Bay West Johnsonville Central
## 15 7
## Johnsonville East Johnsonville North
## 1 3
## Kaiwharawhara Karaka Bay-Worser Bay
## 2 8
## Karori East Karori North
## 9 6
## Karori Park Karori South
## 8 12
## Kelburn Khandallah Park-Broadmeadows
## 17 2
## Kilbirnie East Kilbirnie West-Hataitai South
## 4 22
## Kingston-Mornington Lambton
## 3 41
## Linden Lyall Bay-Airport-Moa Point
## 3 15
## Makara-Ohariu Maupuia
## 10 2
## Melrose-Houghton Bay-Southgate Miramar
## 35 6
## Miramar North Miramar South
## 6 6
## Miramar West Mitchelltown
## 1 3
## Mt Cook-Wallace Street Mt Victoria West
## 48 79
## Newlands East Newlands North
## 3 3
## Newlands South Newtown East
## 2 24
## Newtown West Ngaio South
## 13 6
## Ngauranga East Northland
## 2 13
## Northland North Oriental Bay
## 4 9
## Paparangi Paparangi West
## 1 1
## Rangoon Heights Raroa
## 12 10
## Roseneath Seatoun
## 18 13
## Seatoun Tunnel West Strathmore Park
## 3 11
## Tawa Central Tawa South
## 2 4
## Te Kainga Thorndon-Tinakori Road
## 12 29
## Vogeltown Vogeltown West
## 4 5
## Wadestown Willis Street-Cambridge Terrace
## 17 116
## Wilton Woodridge
## 12 1
We can see that Willis Street-Cambridge Terrace has the maximum number of listings. Factors can include postive reviews due ot good hospitality, relatively safer, more aesthetic and reasonable prices. Creating a subset of these listings only -
Data2.df <- subset(Data.df, neighborhood == "Willis Street-Cambridge Terrace")
table1_1 <- with(Data2.df, table(overall_satisfaction))
table1_1
## overall_satisfaction
## 4.5 5
## 34 43
This gives us 77 very positve reviews, out of the 116 listings so far. However, we must exclude those ratings which have value “NA”. To find out the total number of valid ratings -
table1_2 <- with(Data2.df, table(overall_satisfaction == 4.5| overall_satisfaction == 5.0 ))
table1_2
##
## TRUE
## 77
Therefore, this table tells us that there are only 77 ratings given so far, out of which all are 4.5 or 5.0.
From our previous hypothesis, we know that customers who gave ratings of 4.5 or 5.0 experience no significant difference in terms of price. Price however, from our regression model, depends on the facilities and services provided by the host. Thus, we can infer that Willis Street-Cambridge Terrace, with the highest number of listings and exceptionally positive reviews can be called the best neighbourhood for renting (out of all the other neighbourhoods).
This project was aimed at analysing the services offered by AirBnb in Wellington, the capital of NZ. The prupose was to find out the various factors on which the prices set by the hosts depend and this, along with the overall satisfaction of the guests can help to improve the quality of services offered and improve business. We found out that the prices depend upon the facilities offered - number of bedrooms offered, number of people accommodated, minimum period of stay required mainly. The overall satisfaction of the guests have no significant difference when rated 4.5 or 5.0. Finally, we found out that the neighbourhood of Willis Street-Cambridge Terrace has generated exceptionally positive reviews. Thus, AirBnb can take it as a model for improving other existing lodgings for rent. This will ultimately help grow business and can be expanded globally.
Zervas, Georgios, Davide Proserpio and John W. Byers .The Rise of the Sharing Economy: Estimating the Impact of Airbnb on the Hotel Industry.
Guttentag, Daniel (2016). Airbnb: Why Tourists Choose It and How They Use It.
Jeroen Oskam, Albert Boswijk, (2016). “Airbnb: the future of networked hospitality businesses”, Journal of Tourism Futures, Vol. 2 Issue: 1, pp.22-42, https://doi.org/10.1108/JTF-11-2015-0048
Tan, R. & Pearce, D. G. (2004). Providers’ and intermediaries’ use of the Internet in tourism distribution. In K. A. Smith & C. Schott (Eds.), New Zealand Tourism and Hospitality Research Conference 2004 (pp. 424-432). Wellington, New Zealand: Victoria University of Wellington.