Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present more unique, personalized way of experiencing the world. This dataset describes the listing activity and metrics in NYC, NY for 2019.cThis data file includes all needed information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions.
I will to get information in this dataset. For the first, load all the necessary packages that we will use during analyses.
library(tidyverse)
library(janitor)
library(leaflet)
library(skimr)
library(knitr)
data_raw <- read_csv("new-york-city-airbnb-open-data/AB_NYC_2019.csv") %>%
mutate_if(is.character, tolower)
I used several packages to help me do this task. That is Tidyverse, janitor, skimr, leaflet and knitr.
data_raw %>%
select(name, host_name,
neighbourhood_group, room_type,
latitude, longitude) %>%
head() %>%
kable()
| name | host_name | neighbourhood_group | room_type | latitude | longitude |
|---|---|---|---|---|---|
| clean & quiet apt home by the park | john | brooklyn | private room | 41 | -74 |
| skylit midtown castle | jennifer | manhattan | entire home/apt | 41 | -74 |
| the village of harlem….new york ! | elisabeth | manhattan | private room | 41 | -74 |
| cozy entire floor of brownstone | lisaroxanne | brooklyn | entire home/apt | 41 | -74 |
| entire apt: spacious studio/loft by central park | laura | manhattan | entire home/apt | 41 | -74 |
| large cozy 1 br apartment in midtown east | chris | manhattan | entire home/apt | 41 | -74 |
# Summary Statistics
data_raw %>%
select(-id, -host_id, -longitude, -latitude) %>%
skim() %>%
yank('numeric') %>%
select(-p0, -p25, -p75, -p100) %>%
kable()
| skim_variable | n_missing | complete_rate | mean | sd | p50 | hist |
|---|---|---|---|---|---|---|
| price | 0 | 1.00 | 152.7 | 240.2 | 106.00 | ▇▁▁▁▁ |
| minimum_nights | 0 | 1.00 | 7.0 | 20.5 | 3.00 | ▇▁▁▁▁ |
| number_of_reviews | 0 | 1.00 | 23.3 | 44.5 | 5.00 | ▇▁▁▁▁ |
| reviews_per_month | 10052 | 0.79 | 1.4 | 1.7 | 0.72 | ▇▁▁▁▁ |
| calculated_host_listings_count | 0 | 1.00 | 7.1 | 33.0 | 1.00 | ▇▁▁▁▁ |
| availability_365 | 0 | 1.00 | 112.8 | 131.6 | 45.00 | ▇▂▁▁▂ |
data_raw %>%
skim() %>%
yank('character') %>%
select(-whitespace, -empty) %>%
kable()
| skim_variable | n_missing | complete_rate | min | max | n_unique |
|---|---|---|---|---|---|
| name | 16 | 1 | 1 | 179 | 47469 |
| host_name | 21 | 1 | 1 | 35 | 11428 |
| neighbourhood_group | 0 | 1 | 5 | 13 | 5 |
| neighbourhood | 0 | 1 | 4 | 26 | 221 |
| room_type | 0 | 1 | 11 | 15 | 3 |
I use the ggplot package to make awesome graph in easily.
data_raw %>%
group_by(room_type, neighbourhood_group) %>%
count(room_type) %>%
ggplot()+
geom_bar(aes(as.factor(room_type), n, fill=room_type), stat = 'identity')+
scale_fill_viridis_d(option = "viridis")+
theme(legend.position = "none")+
labs(title = 'Number of Type Room by Neighbourhood Group',
x='Room Type',
y='Count')+
facet_wrap(~neighbourhood_group, ncol = 2)+
coord_flip()
Bronx and Staten Island have the fewest number of hotel. While the most numerous are Brooklyn and Manhanttan. Enire home or Apartment is the most room type in New York.
# Histogram Price by Room Type and Neigbourhood Group
data_raw %>%
ggplot()+
geom_density(aes(x=price, fill=neighbourhood_group))+
scale_x_log10()+
scale_fill_viridis_d(option = "viridis")+
theme(legend.position = 'none') +
ggtitle("Average Price by Room Type") +
facet_grid(room_type ~ neighbourhood_group)#, ncol = 5)
The price of a room with entire home or apartment type on Staten Island is more expensive than others in aver.
# Top 10 Popular Place
data_raw %>%
group_by(neighbourhood_group, name) %>%
count(name) %>%
mutate(hotel = paste(name, "-", neighbourhood_group)) %>%
arrange(-n) %>%
head(n=10) %>%
ggplot()+
geom_bar(aes(x=hotel, y=n, fill=hotel), stat = "identity")+
scale_fill_viridis_d(option = "viridis")+
coord_flip()+
theme(legend.position = "none")+
ggtitle("Top 10 Popular Place in NY")
The most popular/common place in New York City are Hillside Hotel, located in Queens. Followed by Private Room in Brooklyn.
I use a Leaflet package to make interactive map.
# Mapping Location
data_raw %>%
select(longitude, neighbourhood_group,
neighbourhood, latitude, price,
name, room_type, minimum_nights) %>%
leaflet() %>%
setView(lng = -73.95, lat = 40.73, zoom = 10) %>%
addTiles() %>%
addMarkers(clusterOptions = markerClusterOptions(),
~longitude, ~latitude,
label = ~paste(name,"|",
"Type Room :", room_type,"|",
"Min Nights :", minimum_nights))
Map above is addressing the specific hotel locations.
Amri Rohman
Sidoarjo, East Java, ID