Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present more unique, personalized way of experiencing the world. Comparing to hotel, Airbnb have a price advantage and provide more choice for the for traveling. This dataset describes the listing activity and metrics in NYC, NY for 2019. We will explore the data and find out some interesting discovers.
####Data Preparation
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(ggplot2)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
airbnb<-read.csv("https://raw.githubusercontent.com/DaisyCai2019/Homework/master/AB_NYC_2019.csv")
# We only interest in the data of 2019
airbnb<-separate(airbnb,last_review,c("Year","Month","Day"))
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 10052 rows
## [3, 20, 27, 37, 39, 194, 205, 261, 266, 268, 277, 346, 350, 391, 426, 433,
## 438, 487, 546, 586, ...].
airbnb19<-filter(airbnb,Year=="2019")
head(airbnb19)
## id name host_id host_name
## 1 2595 Skylit Midtown Castle 2845 Jennifer
## 2 3831 Cozy Entire Floor of Brownstone 4869 LisaRoxanne
## 3 5099 Large Cozy 1 BR Apartment In Midtown East 7322 Chris
## 4 5178 Large Furnished Room Near B'way 8967 Shunichi
## 5 5238 Cute & Cozy Lower East Side 1 bdrm 7549 Ben
## 6 5295 Beautiful 1br on Upper West Side 7702 Lena
## neighbourhood_group neighbourhood latitude longitude room_type
## 1 Manhattan Midtown 40.75362 -73.98377 Entire home/apt
## 2 Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt
## 3 Manhattan Murray Hill 40.74767 -73.97500 Entire home/apt
## 4 Manhattan Hell's Kitchen 40.76489 -73.98493 Private room
## 5 Manhattan Chinatown 40.71344 -73.99037 Entire home/apt
## 6 Manhattan Upper West Side 40.80316 -73.96545 Entire home/apt
## price minimum_nights number_of_reviews Year Month Day reviews_per_month
## 1 225 1 45 2019 05 21 0.38
## 2 89 1 270 2019 07 05 4.64
## 3 200 3 74 2019 06 22 0.59
## 4 79 2 430 2019 06 24 3.47
## 5 150 1 160 2019 06 09 1.33
## 6 135 5 53 2019 06 22 0.43
## calculated_host_listings_count availability_365
## 1 2 355
## 2 1 194
## 3 1 129
## 4 1 220
## 5 4 188
## 6 1 6
We want to know which neighbourhood is more popular in 2019.
####What are the cases, and how many are there? Each case represents every host data and their apartment/room.
####Describe the method of data collection.
I collect the data from Kaggle, which published by Airbnb two months ago.
####What type of study is this (observational/experiment)?
The data is observational.
####Data Source: If you collected the data, state self-collected. If not, provide a citation/link.
Data source: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
####Response: What is the response variable, and what type is it (numerical/categorical)?
The response variable is number_of_reviews and is numerical.
####Explanatory: What is the explanatory variable(s), and what type is it (numerical/categorival)?
the explanatory variable is the neighbourhood_group and it is categorical.
summary(airbnb19)
## id name
## Min. : 2595 Home away from home : 10
## 1st Qu.:12022728 Loft Suite @ The Box House Hotel: 10
## Median :22343909 Beautiful Brooklyn Brownstone : 5
## Mean :20689219 Brooklyn Apartment : 5
## 3rd Qu.:30376690 Harlem Gem : 5
## Max. :36455809 Private room : 5
## (Other) :25169
## host_id host_name neighbourhood_group
## Min. : 2571 Michael : 215 Bronx : 698
## 1st Qu.: 8159536 Sonder (NYC): 207 Brooklyn :10466
## Median : 40027302 David : 197 Manhattan :10322
## Mean : 78561358 John : 177 Queens : 3456
## 3rd Qu.:139238261 Alex : 153 Staten Island: 267
## Max. :273841667 Maria : 122
## (Other) :24138
## neighbourhood latitude longitude
## Bedford-Stuyvesant: 2209 Min. :40.51 Min. :-74.24
## Williamsburg : 1853 1st Qu.:40.69 1st Qu.:-73.98
## Harlem : 1435 Median :40.72 Median :-73.95
## Bushwick : 1202 Mean :40.73 Mean :-73.95
## Hell's Kitchen : 1119 3rd Qu.:40.76 3rd Qu.:-73.93
## East Village : 866 Max. :40.91 Max. :-73.71
## (Other) :16525
## room_type price minimum_nights
## Entire home/apt:13266 Min. : 0.0 Min. : 1.000
## Private room :11356 1st Qu.: 69.0 1st Qu.: 1.000
## Shared room : 587 Median : 105.0 Median : 2.000
## Mean : 141.8 Mean : 4.898
## 3rd Qu.: 175.0 3rd Qu.: 4.000
## Max. :7500.0 Max. :365.000
##
## number_of_reviews Year Month
## Min. : 1.00 Length:25209 Length:25209
## 1st Qu.: 5.00 Class :character Class :character
## Median : 18.00 Mode :character Mode :character
## Mean : 40.21
## 3rd Qu.: 53.00
## Max. :629.00
##
## Day reviews_per_month calculated_host_listings_count
## Length:25209 Min. : 0.020 Min. : 1.000
## Class :character 1st Qu.: 0.650 1st Qu.: 1.000
## Mode :character Median : 1.460 Median : 1.000
## Mean : 1.974 Mean : 6.148
## 3rd Qu.: 2.840 3rd Qu.: 2.000
## Max. :58.500 Max. :327.000
##
## availability_365
## Min. : 0.0
## 1st Qu.: 22.0
## Median :116.0
## Mean :146.4
## 3rd Qu.:269.0
## Max. :365.0
##
describe(airbnb19$number_of_reviews)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 25209 40.21 55.32 18 28.39 22.24 1 629 628 2.73 10.89
## se
## X1 0.35
airbnb_group<-airbnb19%>%
group_by(neighbourhood_group)%>%
summarize(Mean = mean(number_of_reviews, na.rm=TRUE))
head(airbnb_group)
## # A tibble: 5 x 2
## neighbourhood_group Mean
## <fct> <dbl>
## 1 Bronx 37.7
## 2 Brooklyn 41.5
## 3 Manhattan 38.5
## 4 Queens 41.9
## 5 Staten Island 41.2
ggplot(airbnb19, aes(x=number_of_reviews)) + geom_histogram(binwidth = 20)
ggplot(airbnb_group, aes(x=neighbourhood_group, y=Mean) ) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 10, hjust = 1))
airbnb_group2<-airbnb19%>%
filter(number_of_reviews>400)%>%
mutate(mean(number_of_reviews))
ggplot(airbnb_group2, aes(x=neighbourhood, y=number_of_reviews) ) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 1000, hjust = 1))
Although five boroughs’ average review are similar, traveler most interest in the room/apartment in Queens. To be specific, East Elmhurst in Queens has the most total reviews because it is close to the airport.