About Dataset


Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present more unique, personalized way of experiencing the world. This dataset describes the listing activity and metrics in NYC, NY for 2019.cThis data file includes all needed information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions.

Download Dataset

Loading Library and Read


I will to get information in this dataset. For the first, load all the necessary packages that we will use during analyses.

library(tidyverse)
library(janitor)
library(leaflet)
library(skimr)
library(knitr)

data_raw <- read_csv("new-york-city-airbnb-open-data/AB_NYC_2019.csv") %>%
  mutate_if(is.character, tolower)

I used several packages to help me do this task. That is Tidyverse, janitor, skimr, leaflet and knitr.

Data Preparation


data_raw %>%
  select(name, host_name, 
         neighbourhood_group, room_type, 
         latitude, longitude) %>% 
  head() %>%
  kable()
name host_name neighbourhood_group room_type latitude longitude
clean & quiet apt home by the park john brooklyn private room 41 -74
skylit midtown castle jennifer manhattan entire home/apt 41 -74
the village of harlem….new york ! elisabeth manhattan private room 41 -74
cozy entire floor of brownstone lisaroxanne brooklyn entire home/apt 41 -74
entire apt: spacious studio/loft by central park laura manhattan entire home/apt 41 -74
large cozy 1 br apartment in midtown east chris manhattan entire home/apt 41 -74

Summary


Numeric Variabel

# Summary Statistics
data_raw %>%
  select(-id, -host_id, -longitude, -latitude) %>%
  skim() %>%
  yank('numeric') %>%
  select(-p0, -p25, -p75, -p100) %>%
  kable() 
skim_variable n_missing complete_rate mean sd p50 hist
price 0 1.00 152.7 240.2 106.00 ▇▁▁▁▁
minimum_nights 0 1.00 7.0 20.5 3.00 ▇▁▁▁▁
number_of_reviews 0 1.00 23.3 44.5 5.00 ▇▁▁▁▁
reviews_per_month 10052 0.79 1.4 1.7 0.72 ▇▁▁▁▁
calculated_host_listings_count 0 1.00 7.1 33.0 1.00 ▇▁▁▁▁
availability_365 0 1.00 112.8 131.6 45.00 ▇▂▁▁▂

Character Variable

data_raw %>%
  skim() %>%
  yank('character') %>%
  select(-whitespace, -empty) %>%
  kable() 
skim_variable n_missing complete_rate min max n_unique
name 16 1 1 179 47469
host_name 21 1 1 35 11428
neighbourhood_group 0 1 5 13 5
neighbourhood 0 1 4 26 221
room_type 0 1 11 15 3

Visualize

I use the ggplot package to make awesome graph in easily.

Where is the room with the most number in New York and what about the type?

data_raw %>%
  group_by(room_type, neighbourhood_group) %>%
  count(room_type) %>%
  ggplot()+
  geom_bar(aes(as.factor(room_type), n, fill=room_type), stat = 'identity')+
  scale_fill_viridis_d(option  = "viridis")+
  theme(legend.position = "none")+
  labs(title = 'Number of Type Room by Neighbourhood Group',
       x='Room Type', 
       y='Count')+
  facet_wrap(~neighbourhood_group, ncol = 2)+
  coord_flip()

Bronx and Staten Island have the fewest number of hotel. While the most numerous are Brooklyn and Manhanttan. Enire home or Apartment is the most room type in New York.

How about room price in New York

# Histogram Price by Room Type and Neigbourhood Group
data_raw %>%
  ggplot()+
  geom_density(aes(x=price, fill=neighbourhood_group))+
  scale_x_log10()+
  scale_fill_viridis_d(option  = "viridis")+
  theme(legend.position = 'none') +
  ggtitle("Average Price by Room Type") +
  facet_grid(room_type ~ neighbourhood_group)#, ncol = 5)

The price of a room with entire home or apartment type on Staten Island is more expensive than others in aver.

Mapping

I use a Leaflet package to make interactive map.

# Mapping Location
data_raw %>% 
  select(longitude, neighbourhood_group, 
         neighbourhood, latitude, price,
         name, room_type, minimum_nights) %>%
  leaflet() %>% 
  setView(lng = -73.95, lat = 40.73, zoom = 10) %>%
  addTiles() %>% 
  addMarkers(clusterOptions = markerClusterOptions(),
             ~longitude, ~latitude, 
             label = ~paste(name,"|", 
                            "Type Room :", room_type,"|",
                            "Min Nights :", minimum_nights))

Map above is addressing the specific hotel locations.

Thank You

Amri Rohman
Sidoarjo, East Java, ID