Airbnb is an American company which hosts an online marketplace and hospitality service, for people to lease or rent short-term lodging including vacation rentals, apartment rentals, homestays, hostel beds, or hotel rooms. The company does not own any lodging; it is a broker which receives percentage service fees from both guests and hosts in conjunction with every booking.In January 2018 the company had over 3,000,000 lodging listings in 65,000 cities and 191 countries. Airbnb has its collaboration with the coutries and city all over the world. Our Project is concerned basically in Asheville, US. We are analyzing 854 hotels listing of Airbnb at Asheville. Asheville is a city and the county seat of Buncombe County, North Carolina, United States.It is the largest city in Western North Carolina, and the 12th-most populous city in the U.S. state of North Carolina. For any traveller it becomes quite tranquilizing if the hotel services are good and the customer founds himself/herself satisfied. The satisfaction level of any customer is a subjective thing and it depends on multitude of factors.Those factors may be qualitative and quantitative. These factors may be room type, tariff price, number of bedrooms, number of accomodates allowed etc. But for determining the performance of numerous hotels become tedious task.Therefore, after collecting the relevant data of Airbnb hotel listings at Asheville and performing different statistical analysis and tests, we can have an insight of the factors that impact the most and least on the overall satisfaction level of customers.
The field of study in our Project is the variability of customers with different parameters on their overall satisfaction in hotel listings of Airbnb at Asheville. The specific objective of this project is to analyze these parameter’s impact on the dependent factor i.e. overall satisfaction. The goal of the analysis was to figure out whether room type, price, number of accomodates, number of reviews hold any impact on overall satisfaction level of customers or not. The statistical tools and methods used include Chi Square Test, T-Test, Correlation and Regression.
2.1 Data
The data used to carry out the analysis was an open source data which is taken from Airbnb database. The specific URL for the data is https://s3.amazonaws.com/tomslee-airbnb-data-2/asheville.zip . The detailed description of the different column parameters are given below:
1)room_id: A unique number identifying an Airbnb listing. The listing has a URL on the Airbnb web site of http://airbnb.com/rooms/room_id
2)host_id: A unique number identifying an Airbnb host. The host’s page has a URL on the Airbnb web site of http://airbnb.com/users/show/host_id
3)survey_id: This is the unique identification number of the survey carried out by the Airbnb officials at Asheville.
4)room_type : The room type alloted to each and every customer. This is a qualitative data which differentiates the room type as “Entire home/apt”, “Private room”, or “Shared room”
5)neighborhood: a subregion of the city or search area for which the survey is carried out. For cities that have both, a neighbourhood is smaller than a borough. For some cities there is no neighbourhood information. This is also a qualitative data parameter.
reviews: The number of reviews that a listing has received. Airbnb has said that 70% of visits end up with a review, so the number of reviews can be used to estimate the number of visits. Note that such an estimate will not be reliable for an individual listing (especially as reviews occasionally vanish from the site), but over a city as a whole it should be a useful metric of traffic. This is a quantitative data which tells about the number of reviews.
accommodates: This is a quantitative data which describes the number of guests a hotel listing can accommodate.
bedrooms: This is a quantitative data which describes the number of bedrooms a listing offers.
Price : This is a quantitative data which tells us the price (in $US) for a night stay. In early surveys, there may be some values that were recorded by month.
overall_satisfaction: The average rating (out of five) that the listing has received from those visitors who left a review. This is the quantitative data and the dependent variable of our analysis. We have to check that the above parameters do hold any impact on this dependent variable or not.
2.2 Hypothesis
The main objective of the analysis is to figure out whether the overall satisfaction level of customers is significantly dependent upon the above mentioned factors or not. The null hypothesis, H0 is The overall satisfaction level of customers is independent of all the above parameters. And the alternate hypothesis, H1 is the overall satisfaction level of customers is dependent on the above parameters.
2.3 Model
The regression model was created for the following analysis and it was found that out of 9 parameters only 4 are significant. The regression equation for the model can be given as:
overall satisfaction = 4.085-(0.3401)room_typePrivate room-(0.2331)room_typeShared room+(0.009)reviews-(0.0537)accommodates+(0.0645)bedrooms-(0.0010)price
hotel=read.csv(paste("Prashant Project Data.csv",sep=""), )
View(hotel)
reg1 <- lm(overall_satisfaction ~ room_type + reviews + accommodates + bedrooms + price , data = hotel)
summary(reg1)
##
## Call:
## lm(formula = overall_satisfaction ~ room_type + reviews + accommodates +
## bedrooms + price, data = hotel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7108 -0.1039 0.6725 1.0245 3.1436
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0854118 0.1635207 24.984 < 2e-16 ***
## room_typePrivate room -0.3401051 0.1295452 -2.625 0.00881 **
## room_typeShared room -0.2331303 0.5900378 -0.395 0.69286
## reviews 0.0092462 0.0009294 9.948 < 2e-16 ***
## accommodates -0.0537422 0.0521089 -1.031 0.30267
## bedrooms 0.0645657 0.1164050 0.555 0.57927
## price -0.0010149 0.0003363 -3.018 0.00262 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.645 on 847 degrees of freedom
## Multiple R-squared: 0.1313, Adjusted R-squared: 0.1251
## F-statistic: 21.34 on 6 and 847 DF, p-value: < 2.2e-16
2.4 Results
The analysis of the 854 Airbnb listing hotels at Asheville, US for an elaborated understanding of the impact of the factors like room type, neighborhood, reviews, number of accomodates, number of bedrooms, price on the overall satisfaction level of the customers. The insights are as below:
1)The overall satisfaction level was affected primarily by the type of the room i.e private room, shared room, entire apartment. The satisfaction level is descended by 23 % with per unit rise in room type :sharedroom allocation.
2) The number of bedrooms has a significant impact on overall satisfaction level. The satisfaction level rises by 6% with a unit rise in number of bedrooms in the particular hotel listing. 3) The number of accomodates is aldo a significant factor in the overall satisfaction level. With a rise in one unit of it, the overall satisfaction level is decreased by 6.4%. 4) The number of reviews of the hotel listing and the tariff price is also affecting the overall satisfaction level but not a significant contribution. These are having minimal impact on the overall satisfaction level.
The analysis done on the Airbnb listing hotels at Ashville provided us with various insights. The most important one being that the people at Asheville are more interesed about the room type of their hotel of stay for their overall satisfaction. The type of room appeared to be the most significant factor that impacted the overall satisfaction. Generally, there is a conception of cost leadership, i.e people feel that low tariff price hotels are more preferrable, but according to our analysis the price had a very nominal impact in the satisfaction level of customers. As the multiple r square value came to be 0.1313 , it may be comprehended that there can be other parameters also apart from our study which hold significant impact on the satisfaction level of customers.
1.https://en.wikipedia.org/wiki/Airbnb 2.https://en.wikipedia.org/wiki/Asheville,_North_Carolina 3.http://tomslee.net/airbnb-data-collection-get-the-data 4.https://drive.google.com/open?id=11Bm-UlfH9bYXGhAWi1vCB5lyYnzZa0_Q 5.https://drive.google.com/open?id=1A09_AvoL4UHBD8lCUad5RET-XpLBWmz6 6.https://drive.google.com/open?id=147GjCcshKNC0qnqPPhVDd1UamYUzpUms
5.Appendix
hotel=read.csv(paste("Prashant Project Data.csv",sep=""), )
View(hotel)
dim(hotel)
## [1] 854 10
-> no. of rows is 854 and no. of columns is 10
library(psych)
describe(hotel)
## vars n mean sd median trimmed
## room_id 1 854 11672573.46 5970259.24 13329838 12044585.16
## survey_id 2 854 1498.00 0.00 1498 1498.00
## host_id 3 854 37877448.52 38428065.69 22920130 32332288.35
## room_type* 4 854 1.41 0.51 1 1.38
## neighborhood* 5 854 1.09 0.29 1 1.00
## reviews 6 854 49.11 61.11 28 37.36
## overall_satisfaction 7 854 4.18 1.76 5 4.60
## accommodates 8 854 3.41 1.96 3 3.08
## bedrooms 9 854 1.35 0.84 1 1.25
## price 10 854 126.62 202.38 95 104.00
## mad min max range skew kurtosis
## room_id 6822685.02 67870 19912932 19845062 -0.45 -1.14
## survey_id 0.00 1498 1498 0 NaN NaN
## host_id 29911610.67 62667 141036151 140973484 1.01 -0.11
## room_type* 0.00 1 3 2 0.58 -1.17
## neighborhood* 0.00 1 3 2 2.95 7.28
## reviews 35.58 0 602 602 2.64 12.03
## overall_satisfaction 0.00 0 5 5 -1.93 1.78
## accommodates 1.48 1 17 16 2.09 6.94
## bedrooms 0.00 0 10 10 2.60 16.51
## price 44.48 20 5000 4980 18.02 405.72
## se
## room_id 204298.07
## survey_id 0.00
## host_id 1314981.34
## room_type* 0.02
## neighborhood* 0.01
## reviews 2.09
## overall_satisfaction 0.06
## accommodates 0.07
## bedrooms 0.03
## price 6.93
summary(hotel)
## room_id survey_id host_id
## Min. : 67870 Min. :1498 Min. : 62667
## 1st Qu.: 6413734 1st Qu.:1498 1st Qu.: 6453926
## Median :13329838 Median :1498 Median : 22920130
## Mean :11672573 Mean :1498 Mean : 37877449
## 3rd Qu.:16856088 3rd Qu.:1498 3rd Qu.: 58634762
## Max. :19912932 Max. :1498 Max. :141036151
## room_type neighborhood reviews
## Entire home/apt:512 Asheville :776 Min. : 0.00
## Private room :334 Formerly ETJ : 77 1st Qu.: 8.00
## Shared room : 8 Richmond Hill: 1 Median : 28.00
## Mean : 49.11
## 3rd Qu.: 65.00
## Max. :602.00
## overall_satisfaction accommodates bedrooms price
## Min. :0.00 Min. : 1.000 Min. : 0.000 Min. : 20.0
## 1st Qu.:4.50 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 70.0
## Median :5.00 Median : 3.000 Median : 1.000 Median : 95.0
## Mean :4.18 Mean : 3.412 Mean : 1.352 Mean : 126.6
## 3rd Qu.:5.00 3rd Qu.: 4.000 3rd Qu.: 2.000 3rd Qu.: 139.0
## Max. :5.00 Max. :17.000 Max. :10.000 Max. :5000.0
3)one way contingency table for room type
table(hotel$room_type)
##
## Entire home/apt Private room Shared room
## 512 334 8
hotel_table_two<-xtabs(~ room_type + price, data = hotel)
addmargins(hotel_table_two)
## price
## room_type 20 26 28 29 30 32 33 34 35 36 37 38 39 40
## Entire home/apt 0 0 0 0 0 0 0 0 0 0 1 0 1 0
## Private room 1 0 1 1 2 1 2 4 4 1 0 2 2 10
## Shared room 1 4 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 2 4 1 1 2 1 2 4 4 1 1 2 3 10
## price
## room_type 42 44 45 46 47 48 49 50 51 52 53 54 55 57
## Entire home/apt 1 0 0 1 1 1 2 1 1 1 0 1 4 1
## Private room 3 7 11 3 2 2 4 8 1 2 1 3 15 4
## Shared room 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## Sum 4 7 12 4 3 3 6 9 2 3 1 4 19 5
## price
## room_type 58 59 60 61 62 63 64 65 66 67 68 69 70 71
## Entire home/apt 0 1 2 0 0 0 0 4 0 1 5 4 3 2
## Private room 7 1 12 2 3 4 2 23 1 2 4 5 14 0
## Shared room 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## Sum 7 2 14 2 3 4 2 27 1 4 9 9 17 2
## price
## room_type 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## Entire home/apt 2 1 3 11 1 2 2 9 12 2 0 1 0 14
## Private room 1 1 0 17 0 2 2 11 13 0 1 1 1 24
## Shared room 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## Sum 3 2 3 28 2 4 4 20 25 2 1 2 1 38
## price
## room_type 86 87 88 89 90 91 92 93 94 95 97 98 99 100
## Entire home/apt 2 1 5 16 8 1 1 2 3 17 3 2 21 20
## Private room 0 1 1 4 8 0 2 0 0 9 0 1 6 14
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 2 2 6 20 16 1 3 2 3 26 3 3 27 34
## price
## room_type 102 104 105 106 107 108 109 110 111 112 114 115 116 118
## Entire home/apt 0 2 6 1 3 1 5 13 3 2 4 13 1 1
## Private room 1 0 3 1 0 0 0 0 0 0 0 0 0 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 1 2 9 2 3 1 5 13 3 2 4 13 1 1
## price
## room_type 119 120 123 125 127 128 129 130 134 135 139 140 142 143
## Entire home/apt 7 9 1 21 0 2 1 10 3 10 4 9 1 1
## Private room 0 1 0 1 1 0 2 1 0 2 0 1 0 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 7 10 1 22 1 2 3 11 3 12 4 10 1 1
## price
## room_type 144 145 146 147 149 150 154 155 158 159 160 164 165 167
## Entire home/apt 1 0 3 1 3 20 1 1 2 2 4 1 2 1
## Private room 1 2 0 1 1 2 0 0 0 2 1 0 1 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 2 2 3 2 4 22 1 1 2 4 5 1 3 1
## price
## room_type 169 170 175 177 179 180 183 185 189 190 192 195 198 199
## Entire home/apt 1 0 17 1 1 4 1 3 3 5 1 3 1 5
## Private room 0 1 0 0 0 0 0 0 0 0 0 0 0 1
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 1 1 17 1 1 4 1 3 3 5 1 3 1 6
## price
## room_type 200 205 209 210 219 224 225 229 240 245 249 250 259 262
## Entire home/apt 11 1 0 1 1 1 8 1 1 4 1 17 0 2
## Private room 1 0 3 0 0 0 1 0 0 0 1 0 2 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 12 1 3 1 1 1 9 1 1 4 2 17 2 2
## price
## room_type 265 275 288 289 290 295 300 315 325 329 330 350 375 395
## Entire home/apt 1 5 1 0 1 2 3 1 3 0 1 2 1 1
## Private room 0 0 0 1 0 0 0 0 0 1 0 0 0 0
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Sum 1 5 1 1 1 2 3 1 3 1 1 2 1 1
## price
## room_type 400 425 450 465 475 485 540 600 930 1250 2222 5000 Sum
## Entire home/apt 4 2 3 1 1 1 1 1 1 1 1 1 512
## Private room 0 0 0 0 0 0 0 0 0 0 0 0 334
## Shared room 0 0 0 0 0 0 0 0 0 0 0 0 8
## Sum 4 2 3 1 1 1 1 1 1 1 1 1 854
boxplot(hotel$accommodates, horizontal = TRUE, main = "Box Plot for accomodates", xlab = "accomodates", col = "blue")
boxplot(hotel$overall_satisfaction, horizontal = TRUE, main = "Box Plot for overall satisfaction", xlab = "overall satisafaction", col = "green")
7)Bar Graph for overall satisfaction of the customers
table(hotel$overall_satisfaction)
##
## 0 4 4.5 5
## 127 8 115 604
overall_satisf <-table(hotel$overall_satisfaction)
barplot(overall_satisf, width=0.5, space=1, main = "Overall satisfaction of customers", xlab="satisfaction level
(0=Lowest---5=Highest)",col=c( "yellow", "green","blue", "black","red"), ylim=c(0,860),
xlim=c(0,10), names.arg=c("0","4","4.5","5"))
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(~room_type+reviews+overall_satisfaction+accommodates+bedrooms+price, data=hotel, main="Variation of customer Satisfaction with room_type, reviews, accommodates, bedrooms, price")
chi1 <- xtabs (~ overall_satisfaction + room_type, data=hotel)
chisq.test(chi1)
## Warning in chisq.test(chi1): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: chi1
## X-squared = 31.374, df = 6, p-value = 2.151e-05
-> Here the p value is less than 0.5, which implies that null hypothesis is rejected and alternate hypothesis is accepted. This means that room type is a significant factor of overall satisfaction.
chi2 <- xtabs (~ overall_satisfaction + neighborhood, data=hotel)
chisq.test(chi2)
## Warning in chisq.test(chi2): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: chi2
## X-squared = 9.2236, df = 6, p-value = 0.1614
-> Here, p value is not less than 0.05, this means that neighborhood is not a statistical significant factor of overall satisfaction.
t.test(hotel$overall_satisfaction, hotel$reviews)
##
## Welch Two Sample t-test
##
## data: hotel$overall_satisfaction and hotel$reviews
## t = -21.476, df = 854.41, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -49.03783 -40.82517
## sample estimates:
## mean of x mean of y
## 4.179742 49.111241
-> Here , p value is less than 0.05, which means alternate hypothesis is accepted . Reviews make a significant contribution in overall satisfaction.
t.test(hotel$overall_satisfaction, hotel$accommodates)
##
## Welch Two Sample t-test
##
## data: hotel$overall_satisfaction and hotel$accommodates
## t = 8.5144, df = 1685.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.5907490 0.9443798
## sample estimates:
## mean of x mean of y
## 4.179742 3.412178
Here also, P-value is less than 0.05 impliying that null hypothesis is rejected. Number of Accomodates are significant contributor to the overall satisfaction level.
13)Performing T-Test for finding the dependency of number of bedrooms on overall satisfaction level
t.test(hotel$overall_satisfaction, hotel$bedrooms)
##
## Welch Two Sample t-test
##
## data: hotel$overall_satisfaction and hotel$bedrooms
## t = 42.393, df = 1223.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.696440 2.958127
## sample estimates:
## mean of x mean of y
## 4.179742 1.352459
-> Here also, P-value is less than 0.05 impliying that null hypothesis is rejected. Number of bedrooms are significant contributor to the overall satisfaction level.
t.test(hotel$overall_satisfaction, hotel$price)
##
## Welch Two Sample t-test
##
## data: hotel$overall_satisfaction and hotel$price
## t = -17.679, df = 853.13, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -136.0308 -108.8439
## sample estimates:
## mean of x mean of y
## 4.179742 126.617096
-> Here also, P-value is less than 0.05 impliying that null hypothesis is rejected. Price of hotel rooms are significant contributor to the overall satisfaction level.
cor(hotel[,6:10])
## reviews overall_satisfaction accommodates
## reviews 1.0000000 0.33413467 -0.11658640
## overall_satisfaction 0.3341347 1.00000000 -0.09202409
## accommodates -0.1165864 -0.09202409 1.00000000
## bedrooms -0.1143018 -0.09009348 0.79817570
## price -0.0931582 -0.14207140 0.51468095
## bedrooms price
## reviews -0.11430182 -0.0931582
## overall_satisfaction -0.09009348 -0.1420714
## accommodates 0.79817570 0.5146809
## bedrooms 1.00000000 0.5437369
## price 0.54373690 1.0000000
library(corrgram)
corrgram(hotel[,6:10], order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text.panel=panel.txt,main="Corrgram for different variables")
hotel$room_type[hotel$Res==0] <- 'Entire home/apt'
hotel$room_type[hotel$Res == 1] <- 'Private room'
hotel$room_type[hotel$Res == 2] <- 'Shared room'
hotel$room_type<- factor(hotel$room_type)
reg1 <- lm(overall_satisfaction ~ room_type + reviews + accommodates + bedrooms + price , data = hotel)
summary(reg1)
##
## Call:
## lm(formula = overall_satisfaction ~ room_type + reviews + accommodates +
## bedrooms + price, data = hotel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7108 -0.1039 0.6725 1.0245 3.1436
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0854118 0.1635207 24.984 < 2e-16 ***
## room_typePrivate room -0.3401051 0.1295452 -2.625 0.00881 **
## room_typeShared room -0.2331303 0.5900378 -0.395 0.69286
## reviews 0.0092462 0.0009294 9.948 < 2e-16 ***
## accommodates -0.0537422 0.0521089 -1.031 0.30267
## bedrooms 0.0645657 0.1164050 0.555 0.57927
## price -0.0010149 0.0003363 -3.018 0.00262 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.645 on 847 degrees of freedom
## Multiple R-squared: 0.1313, Adjusted R-squared: 0.1251
## F-statistic: 21.34 on 6 and 847 DF, p-value: < 2.2e-16
-> we have got p value < 2.2e-16 which is less than 0.5, which means that the results are statistically significant. But the multiple r square value is 0.1313 , which means that the factors which we have considered in the regression analysis are only explaining about 15% impact on the dependent factor i.e. the overall satisfation. This implies that the data is insufficient for determining the exact level of customer satisfaction and there can be various other parameters that are needed to be in consideration. 19) Summary
-> The analysis for the given data was performed and the dependecies of various parameters impacting the overall satisfaction of customers on the different hotels of AIRBNB in the Ashville state were examined. After carring out the analysis of given data, it was found that the parameter room type significantly impacts the overall satisfaction of the customers of the vicinity. Also, the parameters bedrooms, price and accommodates are affecting overall satisfaction negligibly. We are also getting the multiple r square value = 0.1313, which means that there are also certain parameters which we have not considered in the analysis that will be more efficient in explaining the current model.