Description :
Airbnb is a platform that allows house and apartment owners to rent their properties to guests for short-term stays.
PHASE 1 : Ask
About the company:
Airbnb, Inc is an American San Francisco-based company operating an online marketplace for short- and long-term homestays and experiences.
The company was founded in 2008 by Brian Chesky, Nathan Blecharczyk, and Joe Gebbia.
Since it was founded in 2008, Airbnb has become one of the most successful and valuable start-ups in the world and has significantly impacted the HORECA (hotel, restaurant, and catering) industry.
Content:
Key task :
Deliverable :
PHASE 2 : Prepare
This is a Public Dataset is part of Airbnb, and the original source can be found on InsideAirbnb.
Downloaded the data and stored it on my Google Drive
The data has been made available by Inside Airbnb with No Copyright CC0: Public Domain
Using the ROCCC System to determine the credibility and integrity of the data.
Reliability: This data is reliable. This public dataset is a subset of Airbnb data and is made available for public use.
Originality: This is Original subset dataset.
Comprehensiveness: This data is comprehensive. It provides comprehensive information about Airbnb listings, hosts, and various metrics for analysis and research purposes.
Current: A recent dataset, which is current.
Cited: Inside Airbnb created the dataset, made it Public Dataset so this is Credible
Key task :
Deliverable :
PHASE 3 : Process
# install.packages("tidyverse")
# install.packages("dplyr")
# install.packages("skimr")
# install.packages("mice")
# install.packages("randomForest")
# install.packages("corrplot")
# install.packages("ggcorrplot")library(tidyverse)
library(dplyr)
library(skimr)
library(mice)
library(randomForest)
library(corrplot)
library(ggcorrplot)Data Wrangling :
## id name host_id host_name
## Min. :2.595e+03 Length:42931 Min. : 1678 Length:42931
## 1st Qu.:1.940e+07 Class :character 1st Qu.: 16085328 Class :character
## Median :4.337e+07 Mode :character Median : 74338125 Mode :character
## Mean :2.223e+17 Mean :151601209
## 3rd Qu.:6.305e+17 3rd Qu.:268069240
## Max. :8.405e+17 Max. :503872891
##
## neighbourhood_group neighbourhood latitude longitude
## Length:42931 Length:42931 Min. :40.50 Min. :-74.25
## Class :character Class :character 1st Qu.:40.69 1st Qu.:-73.98
## Mode :character Mode :character Median :40.72 Median :-73.95
## Mean :40.73 Mean :-73.94
## 3rd Qu.:40.76 3rd Qu.:-73.92
## Max. :40.91 Max. :-73.71
##
## room_type price minimum_nights number_of_reviews
## Length:42931 Min. : 0.0 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.: 75.0 1st Qu.: 2.00 1st Qu.: 1.00
## Mode :character Median : 125.0 Median : 7.00 Median : 5.00
## Mean : 200.3 Mean : 18.11 Mean : 25.86
## 3rd Qu.: 200.0 3rd Qu.: 30.00 3rd Qu.: 24.00
## Max. :99000.0 Max. :1250.00 Max. :1842.00
##
## last_review reviews_per_month calculated_host_listings_count
## Length:42931 Min. : 0.010 Min. : 1.00
## Class :character 1st Qu.: 0.140 1st Qu.: 1.00
## Mode :character Median : 0.520 Median : 1.00
## Mean : 1.169 Mean : 24.05
## 3rd Qu.: 1.670 3rd Qu.: 4.00
## Max. :86.610 Max. :526.00
## NA's :10304
## availability_365 number_of_reviews_ltm license
## Min. : 0.0 Min. : 0.000 Length:42931
## 1st Qu.: 0.0 1st Qu.: 0.000 Class :character
## Median : 89.0 Median : 0.000 Mode :character
## Mean :140.3 Mean : 7.737
## 3rd Qu.:289.0 3rd Qu.: 7.000
## Max. :365.0 Max. :1093.000
##
nyc_df <- nyc_list %>%
rename(list_id = id,
listing_name = name,
area = neighbourhood_group,
geo_location = neighbourhood,
host_list_count = calculated_host_listings_count,
reviews_per_year = number_of_reviews_ltm,
reviews_per_month_pct = reviews_per_month) %>%
select(-license)
nyc_df | Name | nyc_df |
| Number of rows | 42931 |
| Number of columns | 17 |
| _______________________ | |
| Column type frequency: | |
| character | 6 |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_name | 0 | 1 | 0 | 249 | 10 | 41410 | 0 |
| host_name | 0 | 1 | 0 | 35 | 5 | 9832 | 0 |
| area | 0 | 1 | 5 | 13 | 0 | 5 | 0 |
| geo_location | 0 | 1 | 4 | 25 | 0 | 223 | 0 |
| room_type | 0 | 1 | 10 | 15 | 0 | 4 | 0 |
| last_review | 0 | 1 | 0 | 10 | 10304 | 2796 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| list_id | 0 | 1.00 | 2.222772e+17 | 3.344213e+17 | 2595.00 | 19404736.00 | 43374815.00 | 6.305016e+17 | 8.404660e+17 |
| host_id | 0 | 1.00 | 1.516012e+08 | 1.621301e+08 | 1678.00 | 16085328.00 | 74338125.00 | 2.680692e+08 | 5.038729e+08 |
| latitude | 0 | 1.00 | 4.073000e+01 | 6.000000e-02 | 40.50 | 40.69 | 40.72 | 4.076000e+01 | 4.091000e+01 |
| longitude | 0 | 1.00 | -7.394000e+01 | 6.000000e-02 | -74.25 | -73.98 | -73.95 | -7.392000e+01 | -7.371000e+01 |
| price | 0 | 1.00 | 2.003100e+02 | 8.950800e+02 | 0.00 | 75.00 | 125.00 | 2.000000e+02 | 9.900000e+04 |
| minimum_nights | 0 | 1.00 | 1.811000e+01 | 2.746000e+01 | 1.00 | 2.00 | 7.00 | 3.000000e+01 | 1.250000e+03 |
| number_of_reviews | 0 | 1.00 | 2.586000e+01 | 5.662000e+01 | 0.00 | 1.00 | 5.00 | 2.400000e+01 | 1.842000e+03 |
| reviews_per_month_pct | 10304 | 0.76 | 1.170000e+00 | 1.790000e+00 | 0.01 | 0.14 | 0.52 | 1.670000e+00 | 8.661000e+01 |
| host_list_count | 0 | 1.00 | 2.405000e+01 | 8.087000e+01 | 1.00 | 1.00 | 1.00 | 4.000000e+00 | 5.260000e+02 |
| availability_365 | 0 | 1.00 | 1.402600e+02 | 1.420000e+02 | 0.00 | 0.00 | 89.00 | 2.890000e+02 | 3.650000e+02 |
| reviews_per_year | 0 | 1.00 | 7.740000e+00 | 1.829000e+01 | 0.00 | 0.00 | 0.00 | 7.000000e+00 | 1.093000e+03 |
nyc_df$reviews_per_year <- as.integer(nyc_df$reviews_per_year)
nyc_df$last_review <- as.integer(nyc_df$last_review)
nyc_df <- as.data.frame(nyc_df)
skim_without_charts(nyc_df)| Name | nyc_df |
| Number of rows | 42931 |
| Number of columns | 17 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 12 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_name | 10 | 1 | 1 | 249 | 0 | 41409 | 0 |
| host_name | 5 | 1 | 1 | 35 | 0 | 9831 | 0 |
| area | 0 | 1 | 5 | 13 | 0 | 5 | 0 |
| geo_location | 0 | 1 | 4 | 25 | 0 | 223 | 0 |
| room_type | 0 | 1 | 10 | 15 | 0 | 4 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| list_id | 0 | 1.00 | 2.222772e+17 | 3.344213e+17 | 2595.00 | 19404736.00 | 43374815.00 | 6.305016e+17 | 8.404660e+17 |
| host_id | 0 | 1.00 | 1.516012e+08 | 1.621301e+08 | 1678.00 | 16085328.00 | 74338125.00 | 2.680692e+08 | 5.038729e+08 |
| latitude | 0 | 1.00 | 4.073000e+01 | 6.000000e-02 | 40.50 | 40.69 | 40.72 | 4.076000e+01 | 4.091000e+01 |
| longitude | 0 | 1.00 | -7.394000e+01 | 6.000000e-02 | -74.25 | -73.98 | -73.95 | -7.392000e+01 | -7.371000e+01 |
| price | 0 | 1.00 | 2.003100e+02 | 8.950800e+02 | 0.00 | 75.00 | 125.00 | 2.000000e+02 | 9.900000e+04 |
| minimum_nights | 0 | 1.00 | 1.811000e+01 | 2.746000e+01 | 1.00 | 2.00 | 7.00 | 3.000000e+01 | 1.250000e+03 |
| number_of_reviews | 0 | 1.00 | 2.586000e+01 | 5.662000e+01 | 0.00 | 1.00 | 5.00 | 2.400000e+01 | 1.842000e+03 |
| last_review | 10304 | 0.76 | 1.885591e+04 | 8.011800e+02 | 15106.00 | 18331.00 | 19319.00 | 1.938900e+04 | 1.942200e+04 |
| reviews_per_month_pct | 10304 | 0.76 | 1.170000e+00 | 1.790000e+00 | 0.01 | 0.14 | 0.52 | 1.670000e+00 | 8.661000e+01 |
| host_list_count | 0 | 1.00 | 2.405000e+01 | 8.087000e+01 | 1.00 | 1.00 | 1.00 | 4.000000e+00 | 5.260000e+02 |
| availability_365 | 0 | 1.00 | 1.402600e+02 | 1.420000e+02 | 0.00 | 0.00 | 89.00 | 2.890000e+02 | 3.650000e+02 |
| reviews_per_year | 0 | 1.00 | 7.740000e+00 | 1.829000e+01 | 0.00 | 0.00 | 0.00 | 7.000000e+00 | 1.093000e+03 |
##
## iter imp variable
## 1 1 last_review reviews_per_month_pct
## 1 2 last_review reviews_per_month_pct
## 1 3 last_review reviews_per_month_pct
## 1 4 last_review reviews_per_month_pct
## 1 5 last_review reviews_per_month_pct
## 1 6 last_review reviews_per_month_pct
## 1 7 last_review reviews_per_month_pct
## 1 8 last_review reviews_per_month_pct
## 1 9 last_review reviews_per_month_pct
## 1 10 last_review reviews_per_month_pct
## 2 1 last_review reviews_per_month_pct
## 2 2 last_review reviews_per_month_pct
## 2 3 last_review reviews_per_month_pct
## 2 4 last_review reviews_per_month_pct
## 2 5 last_review reviews_per_month_pct
## 2 6 last_review reviews_per_month_pct
## 2 7 last_review reviews_per_month_pct
## 2 8 last_review reviews_per_month_pct
## 2 9 last_review reviews_per_month_pct
## 2 10 last_review reviews_per_month_pct
## 3 1 last_review reviews_per_month_pct
## 3 2 last_review reviews_per_month_pct
## 3 3 last_review reviews_per_month_pct
## 3 4 last_review reviews_per_month_pct
## 3 5 last_review reviews_per_month_pct
## 3 6 last_review reviews_per_month_pct
## 3 7 last_review reviews_per_month_pct
## 3 8 last_review reviews_per_month_pct
## 3 9 last_review reviews_per_month_pct
## 3 10 last_review reviews_per_month_pct
## 4 1 last_review reviews_per_month_pct
## 4 2 last_review reviews_per_month_pct
## 4 3 last_review reviews_per_month_pct
## 4 4 last_review reviews_per_month_pct
## 4 5 last_review reviews_per_month_pct
## 4 6 last_review reviews_per_month_pct
## 4 7 last_review reviews_per_month_pct
## 4 8 last_review reviews_per_month_pct
## 4 9 last_review reviews_per_month_pct
## 4 10 last_review reviews_per_month_pct
## 5 1 last_review reviews_per_month_pct
## 5 2 last_review reviews_per_month_pct
## 5 3 last_review reviews_per_month_pct
## 5 4 last_review reviews_per_month_pct
## 5 5 last_review reviews_per_month_pct
## 5 6 last_review reviews_per_month_pct
## 5 7 last_review reviews_per_month_pct
## 5 8 last_review reviews_per_month_pct
## 5 9 last_review reviews_per_month_pct
## 5 10 last_review reviews_per_month_pct
## Warning: Number of logged events: 5
nyc_df$host_id <- as.numeric(nyc_df$host_id)
nyc_df$price <- as.numeric(nyc_df$price)
nyc_df$minimum_nights <- as.numeric(nyc_df$minimum_nights)
nyc_df$number_of_reviews <- as.numeric(nyc_df$number_of_reviews)
nyc_df$last_review <- as.Date(nyc_df$last_review)
nyc_df$year <- format(as.Date(nyc_df$last_review), "%Y") # mutated year| Name | test_com |
| Number of rows | 42931 |
| Number of columns | 18 |
| _______________________ | |
| Column type frequency: | |
| character | 6 |
| Date | 1 |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| listing_name | 10 | 1 | 1 | 249 | 0 | 41409 | 0 |
| host_name | 5 | 1 | 1 | 35 | 0 | 9831 | 0 |
| area | 0 | 1 | 5 | 13 | 0 | 5 | 0 |
| geo_location | 0 | 1 | 4 | 25 | 0 | 223 | 0 |
| room_type | 0 | 1 | 10 | 15 | 0 | 4 | 0 |
| year | 0 | 1 | 4 | 4 | 0 | 13 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| last_review | 0 | 1 | 2011-05-12 | 2023-03-06 | 2022-03-06 | 2795 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| list_id | 0 | 1 | 2.222772e+17 | 3.344213e+17 | 2595.00 | 19404736.00 | 43374815.00 | 6.305016e+17 | 8.404660e+17 |
| host_id | 0 | 1 | 1.516012e+08 | 1.621301e+08 | 1678.00 | 16085328.00 | 74338125.00 | 2.680692e+08 | 5.038729e+08 |
| latitude | 0 | 1 | 4.073000e+01 | 6.000000e-02 | 40.50 | 40.69 | 40.72 | 4.076000e+01 | 4.091000e+01 |
| longitude | 0 | 1 | -7.394000e+01 | 6.000000e-02 | -74.25 | -73.98 | -73.95 | -7.392000e+01 | -7.371000e+01 |
| price | 0 | 1 | 2.003100e+02 | 8.950800e+02 | 0.00 | 75.00 | 125.00 | 2.000000e+02 | 9.900000e+04 |
| minimum_nights | 0 | 1 | 1.811000e+01 | 2.746000e+01 | 1.00 | 2.00 | 7.00 | 3.000000e+01 | 1.250000e+03 |
| number_of_reviews | 0 | 1 | 2.586000e+01 | 5.662000e+01 | 0.00 | 1.00 | 5.00 | 2.400000e+01 | 1.842000e+03 |
| reviews_per_month_pct | 0 | 1 | 9.100000e-01 | 1.630000e+00 | 0.01 | 0.08 | 0.25 | 1.170000e+00 | 8.661000e+01 |
| host_list_count | 0 | 1 | 2.405000e+01 | 8.087000e+01 | 1.00 | 1.00 | 1.00 | 4.000000e+00 | 5.260000e+02 |
| availability_365 | 0 | 1 | 1.402600e+02 | 1.420000e+02 | 0.00 | 0.00 | 89.00 | 2.890000e+02 | 3.650000e+02 |
| reviews_per_year | 0 | 1 | 7.740000e+00 | 1.829000e+01 | 0.00 | 0.00 | 0.00 | 7.000000e+00 | 1.093000e+03 |
Key task :
Deliverable :
PHASE 4 : Analysis
Descriptive analyses are being used to summarize and explore the behavior of the data.
Total Hosts
## [1] 27455
## [1] 42931 18
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 75.0 125.0 200.3 200.0 99000.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 75.0 125.0 200.4 200.0 99000.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 7.00 18.12 30.00 1250.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 5.00 25.83 24.00 1842.00
##
## Bronx Brooklyn Manhattan Queens Staten Island
## 1690 16234 17635 6916 429
##
## Entire home/apt Hotel room Private room Shared room
## 24279 170 17879 576
area_freq <- test_com %>%
group_by(area) %>%
summarise(total_list = sum(host_list_count))%>%
mutate(percent = total_list *100 / sum(total_list))
area_freqprice_freq <- test_com %>%
mutate(price_range = case_when(price > 5 & price < 50 ~ "10 - 49",
price >= 50 & price < 100 ~ "50 - 99",
price >= 100 & price < 200 ~ "100 - 199",
price >= 200 & price < 300 ~ "200 - 299",
price >= 300 & price < 1000 ~ "301 - 999",
price >= 1000 ~ "above 1000")) %>%
group_by(price_range) %>%
summarise(total_list = sum(host_list_count)) %>%
mutate(percent = total_list *100 / sum(total_list)) %>%
arrange(price_range)
price_freqgeo_area_freq <- test_com %>%
group_by(geo_location, area) %>%
summarise(total_list = sum(host_list_count),
min_price = min(price),
avg_price = mean(price),
max_price = max(price),
most_price = median(price))
geo_area_freqyear_room_freq <- test_com %>%
group_by(year, room_type) %>%
summarise(total_list = sum(host_list_count),
reviews_per_year = sum(reviews_per_year))%>%
mutate(percent = 100* reviews_per_year/sum(reviews_per_year))
year_room_freqhost_list_count_total <- test_com %>%
group_by(host_name) %>%
summarise(total_list_count = sum(host_list_count)) %>%
arrange(desc(total_list_count))
host_list_count_totalhost_price_total <- test_com %>%
group_by(host_name) %>%
summarise(total_price = sum(price)) %>%
arrange(desc(total_price))
host_price_totalhost_room_total <- test_com %>%
select(host_name, room_type, price) %>%
group_by(host_name, room_type) %>%
summarise(total_price = sum(price)) %>%
arrange(desc(total_price))## `summarise()` has grouped output by 'host_name'. You can override using the
## `.groups` argument.
room_price_freq <- test_com %>%
select(room_type, area, price, host_list_count) %>%
group_by(room_type) %>%
summarise(min_price = min(price),
avg_price = mean(price),
most_price = median(price),
max_price = max(price),
total_list = sum(host_list_count)) %>%
mutate(list_percent = total_list * 100 / sum(total_list))
PHASE 5 : Visualization
corr_df <- test_com %>%
select(list_id, host_id, price, minimum_nights,
number_of_reviews, last_review, reviews_per_month_pct,
host_list_count, reviews_per_year, availability_365, year)
corr_df$year <- as.numeric(corr_df$year)
corr_df$last_review <- as.numeric(corr_df$last_review)
str(corr_df)## 'data.frame': 42904 obs. of 11 variables:
## $ list_id : num 2595 5121 5203 5178 5136 ...
## $ host_id : num 2845 7356 7490 8967 7378 ...
## $ price : num 150 60 75 68 275 93 295 124 200 81 ...
## $ minimum_nights : num 30 30 2 2 60 3 4 3 1 30 ...
## $ number_of_reviews : num 49 50 118 575 3 350 45 223 68 189 ...
## $ last_review : num 19164 18232 17368 19407 19214 ...
## $ reviews_per_month_pct: num 0.3 0.3 0.72 3.41 0.03 2.25 0.27 1.32 0.44 1.13 ...
## $ host_list_count : int 3 2 1 1 1 1 1 3 4 1 ...
## $ reviews_per_year : int 1 0 0 52 1 48 4 17 0 5 ...
## $ availability_365 : int 314 365 0 106 181 145 1 164 310 207 ...
## $ year : num 2022 2019 2017 2023 2022 ...
area_freq %>%
ggplot(aes(area, total_list, fill= area)) +
geom_bar(position = "dodge", stat = "identity") +
geom_text(aes(label = total_list), vjust = 0) +
guides(fill = guide_legend(title = "Area")) +
theme(legend.position = "none") +
labs(x = "Cities",
y = "Total Listings",
title = "Total Listings made at Cities :",
caption = "Data Analyst : JP") + theme_minimal() +
scale_y_continuous(labels = scales::comma) +
theme(legend.position = "none")
price_freq %>%
ggplot(aes(price_range, total_list, fill = total_list)) +
geom_bar(position = "dodge", stat = "identity") +
geom_text(aes(label = total_list), vjust = 0) +
scale_y_continuous(labels = scales::comma) +
guides(fill = guide_legend(title = "Total Lists")) +
theme(legend.position = "none") +
labs(x = "Price Range",
y = "Total Listings",
title = "Price Range for Total listings :",
subtitle = "cheap airbnb price range",
caption = "Data Analyst : JP") + theme_minimal() +
scale_y_continuous(labels = scales::comma) +
theme(legend.position = "none")## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
ggplot(test_com, aes(x = room_type, y = price)) +
geom_boxplot(aes(fill = room_type)) + scale_y_log10(limits = c(1, 10000), labels = scales::comma) +
geom_hline(yintercept = mean(test_com$price), color = "purple", linetype = 6) +
annotate("text", x = 1,
y = median(test_com$price[test_com$room_type == "Entire home/apt"]),
label = round(median(test_com$price[test_com$room_type == "Entire home/apt"]), 2),
size = 5, color = "white") +
annotate("text", x = 2,
y = median(test_com$price[test_com$room_type == "Hotel room"]),
label = round(median(test_com$price[test_com$room_type == "Hotel room"]), 2),
size = 5, color = "red") +
annotate("text", x = 3,
y = median(test_com$price[test_com$room_type == "Private room"]),
label = round(median(test_com$price[test_com$room_type == "Private room"]), 2),
size = 5, color = "lightgreen") +
annotate("text", x = 4,
y = median(test_com$price[test_com$room_type == "Shared room"]),
label = round(median(test_com$price[test_com$room_type == "Shared room"]), 2),
size = 5, color = "green") +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Room Type",
y = "Price",
title = "Price Distribution by Room Type :",
caption = "Data Analyst : JP") ## Warning: Removed 7 rows containing non-finite values (`stat_boxplot()`).
room_price_freq %>%
ggplot(aes(room_type, avg_price, fill = room_type)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_continuous(labels = scales::comma) +
guides(fill = guide_legend(title = "Most Price")) +
geom_text(aes(label = most_price), vjust = 0) +
theme(legend.position = "none") +
labs(x = "Room Types",
y = "Average Price",
title = "Room Types with Average Price :",
subtitle = "Median Price floating for Room Types",
caption = "Data Analyst : JP") +
theme_minimal() +
scale_y_continuous(labels = scales::comma) +
theme(legend.position = "none")## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
year_room_freq %>%
ggplot(aes(year, total_list, color = room_type)) +
scale_y_continuous(labels = scales::comma) +
geom_point(size = 2, alpha = 10) +
labs(x = "Years",
y = "Total Listings",
title = "Total Listings made over Years :",
subtitle = "",
caption = "Data Analyst : JP") +
theme_minimal()host_list_count_total %>%
slice(1:5) %>%
ggplot(aes(host_name, total_list_count, fill = host_name)) +
geom_bar(position = "dodge", stat = "identity") +
geom_text(aes(label = total_list_count), vjust = 0) +
theme(legend.position = "none") +
labs(x = "Host Name",
y = "Total Listings",
title = "Top 5 Listings by Hosts",
subtitle = "",
caption = "Data Analyst : JP") +
theme_minimal() +
scale_y_continuous(labels = scales::comma) +
theme(legend.position = "none")host_room_total %>%
filter( total_price >= 52089) %>%
ggplot(aes(total_price, host_name, fill = total_price)) +
geom_bar(position = "dodge", stat = "identity") +
scale_fill_steps2() +
geom_text(aes(label = total_price), vjust = 0) +
theme(legend.position = "none") +
labs(x = "Total Price in $ ",
y = "Host Name",
color = "Room Type",
title = "Top Prices by Host Names",
subtitle = "",
caption = "Data Analyst : JP") +
theme_minimal() +
theme(legend.position = "none")test_com %>%
ggplot(aes(reviews_per_month_pct, reviews_per_year, color = room_type)) +
geom_point(size = 2, alpha = 0.8) +
geom_smooth(method = 'lm' , se = F, color = 'purple') +
labs(x = "Reviews per Month PCT",
y = "Reviews per Year",
color = "Room type",
title = "Airbnb's Reviews per Month by Year:",
subtitle = "Linear Regression Model has 'Strog' fit",
caption = "Data Analyst : JP") +
annotate("text", x= 15, y= 900, label = "R^2 = 0.73", color = "darkgreen",
fontface = "bold", size = 5, angle = 25 ) +
theme_minimal()
PHASE 6 : Act
Data-Driven Decision-Making :
Blueground $ RoomPicks are the Top 2 Hosts in NYC made 175662 $ $ 318578 $ Respc.
Blueground made maximum listings among other hosts in NYC for over 276676 listings .
Since 2017 hosts liked the idea of Airbnb, A progressive graph for Airbnb.
Hotel Rooms are the most Expensive on average price
Manhattan alone has 62.5 % of total listings**
Suggestion(s) :
Except Hotel Rooms, Rests are the cheapest, and travellers will like it
Travellers will like Bronx to stay since it can roam the city and even has lowest average price rated among others.
Staten Island has low crowd but higher average price
for travellers to stay, those want less crowd can go for Staten
Island
THANK YOU