1 Summary

This research focuses on anlyzing the airbnb Bangkok in 2022. Then, there are several limitations as well as the limitation of data source, further information related to the description of each variable, the property’s category only based on availability_365 not from total visitors due to limited data sources, and the availability_365 variable needs to be updated.

Based on analysis the total review trend was a bulish trend in 2022, but there was a down trend in Nov - Dec 2022 so that the company has to learn also develop strategy for the future. Next, the company should aware and educate to the owner’s property which located in Bang Bon, Nong Chok, Nong Khaem, Thung Khru, Thawi Watthana for Entire home/apt room type also Bang Bon, Bueng Kum, Khlong Sam Wa, Nong Chok, Nong Khaem, and Thung Khru for Private room type due to the Entire home/apt room type and Private room type have big in demand property proportion in Bangkok. And also, Vadhana, Khlong Toei has big proportion of not really on demand property, then followed by Huai Khwang and Ratchathewi so that the company has to encourage and support them to get more loyal customers.Finally, The 50% of price’s data distribution is located in the range of price starting from HK$ 1,100 - HK$ 1,800 for the biggest proportion of in demand property (especially for Entire home/apt room type and Private room type) in Bangkok 2022.

2 Preface

2.1 Background

Airbnb, Inc. is an American San Francisco-based company operating an online marketplace for short-term homestays and experiences. The company acts as a broker and charges a commission from each booking. The company was founded in 2008 by Brian Chesky, Nathan Blecharczyk, and Joe Gebbia.

2.2 Data Source

[Airbnb_Bangkok Dataset] (http://insideairbnb.com/get-the-data/)

2.3 Data Description

No. Feature Description
1. id Airbnb’s unique identifier for the listing
2. name Name of the listing
3. host_id Airbnb’s unique identifier for the host/user
4. host_name Name of the host. Usually just the first name
5. neighbourhood The listing’s region
6. latitude Uses the World Geodetic System (WGS84) projection for latitude
7. longitude Uses the World Geodetic System (WGS84) projection for longitude
8. room_type The listing’s room type: Entire home/apt, Hotel room, Private room, Shared room
9. price Daily price in HK$ (1HK$ = Rp. 1,932.92)
10. minimum_nights Minimum number of night stay for the listing (calendar rules may be different)
11. number_of_reviews The number of reviews the listing has
12. reviews_per_month The number of reviews the listing has over the lifetime of the listing
13. availability_365 The length of time the building has not been rented in a year
14. availability_category The list’s category based on the length of time the building has not been rented in a year
15. number_of_reviews_ltm The number of reviews the listing has (in the last 12 months)

2.4 List Packages

# Data Cleaning
library(readr) 
library(tidyverse) 
library(dplyr) 
library(lubridate) 
library(glue) 
library(ggplot2) 
library(plotly) 
library(scales) 
library(leaflet)
library(treemap)
library(sunburstR)

3 Data Preprocessing

3.1 Read & Extracting Data

airbnb <- read.csv("data_input/listings.csv",encoding= "latin1")
airbnb <- as.data.frame(airbnb)
airbnb
review <- read.csv("data_input/reviews_bangkok.csv")
review

4 Data Wrangling

4.1 Data Inspection

In this part, this research will use 15 variable in accordance with the needs of research.

airbnb <- airbnb %>% 
  select(-c(5,13,15,18))

head(airbnb,200)
tail(airbnb,200)
# Subsetting for year = 2022
review <- review %>% 
  filter(year(ymd(date))%in%2022)

head(review)
# review in 2022
review_2022 <- review %>% 
  filter(year(ymd(date))%in%2022)
# cahnge data type - date
review <- review %>% 
  mutate(date=as_date(date))

review_2022 <- review_2022 %>% 
  mutate(date=as_date(date))

#Parse date
# Month
review <- review %>% 
  mutate(month = month(date,label = T, abbr = F))
review_2022 <- review_2022 %>% 
  mutate(b=month(date,label=T))
review_2022 <- review_2022 %>% 
  mutate(month = month(date,label = T, abbr = F))
review_2022 <- review_2022 %>% 
  mutate(bulan = month(date,label = F))

# Year
review <- review %>% 
  mutate(year = year(date))
review_2022 <- review_2022 %>% 
  mutate(year = year(date))

Creating a function to categorize availability_365 variable.

# Defined a function

convert_availability <- function(y){
  if(y <= 146)
  {
    y <- "in demand"
  }
  else
    if(y>147 & y<270)
    {
      y <- "on the average"
    }
  else
  {
    y <- "not really on demand"
  }
}

# Implementation
airbnb$availability_category <- sapply(X = airbnb$availability_365, FUN = convert_availability)

# Change a column series position
airbnb <- airbnb %>% 
  relocate(availability_category,.after = availability_365)

# Check dataframe
head(airbnb)
glimpse(airbnb)
#> Rows: 15,854
#> Columns: 15
#> $ id                    <dbl> 27934, 27979, 28745, 35780, 941865, 1704776, 487…
#> $ name                  <chr> "Nice room with superb city view", "Easy going l…
#> $ host_id               <int> 120437, 120541, 123784, 153730, 610315, 2129668,…
#> $ host_name             <chr> "Nuttee", "Emy", "Familyroom", "Sirilak", "Kasem…
#> $ neighbourhood         <chr> "Ratchathewi", "Bang Na", "Bang Kapi", "Din Daen…
#> $ latitude              <dbl> 13.75983, 13.66818, 13.75232, 13.78823, 13.76872…
#> $ longitude             <dbl> 100.5413, 100.6167, 100.6240, 100.5726, 100.6334…
#> $ room_type             <chr> "Entire home/apt", "Private room", "Private room…
#> $ price                 <int> 1905, 1316, 800, 1286, 1905, 1000, 1558, 1461, 1…
#> $ minimum_nights        <int> 3, 1, 60, 7, 1, 250, 3, 1, 3, 2, 2, 15, 2, 2, 30…
#> $ number_of_reviews     <int> 65, 0, 0, 2, 0, 19, 1, 0, 10, 4, 27, 129, 208, 3…
#> $ reviews_per_month     <dbl> 0.50, NA, NA, 0.03, NA, 0.17, 0.01, NA, 0.09, 0.…
#> $ availability_365      <int> 353, 358, 365, 323, 365, 365, 365, 365, 365, 87,…
#> $ availability_category <chr> "not really on demand", "not really on demand", …
#> $ number_of_reviews_ltm <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
glimpse(review)
#> Rows: 54,942
#> Columns: 4
#> $ listing_id <dbl> 35780, 145343, 145343, 145343, 145343, 145343, 145343, 1453…
#> $ date       <date> 2022-04-01, 2022-05-05, 2022-05-08, 2022-06-06, 2022-06-24…
#> $ month      <ord> April, May, May, June, June, July, July, August, August, Au…
#> $ year       <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,…

4.2 Change The Data Type

airbnb <- airbnb %>% 
  mutate(neighbourhood = as.factor(neighbourhood),
         room_type = as.factor(room_type),
         availability_category = as.factor(availability_category))
glimpse(airbnb)
#> Rows: 15,854
#> Columns: 15
#> $ id                    <dbl> 27934, 27979, 28745, 35780, 941865, 1704776, 487…
#> $ name                  <chr> "Nice room with superb city view", "Easy going l…
#> $ host_id               <int> 120437, 120541, 123784, 153730, 610315, 2129668,…
#> $ host_name             <chr> "Nuttee", "Emy", "Familyroom", "Sirilak", "Kasem…
#> $ neighbourhood         <fct> Ratchathewi, Bang Na, Bang Kapi, Din Daeng, Bang…
#> $ latitude              <dbl> 13.75983, 13.66818, 13.75232, 13.78823, 13.76872…
#> $ longitude             <dbl> 100.5413, 100.6167, 100.6240, 100.5726, 100.6334…
#> $ room_type             <fct> Entire home/apt, Private room, Private room, Pri…
#> $ price                 <int> 1905, 1316, 800, 1286, 1905, 1000, 1558, 1461, 1…
#> $ minimum_nights        <int> 3, 1, 60, 7, 1, 250, 3, 1, 3, 2, 2, 15, 2, 2, 30…
#> $ number_of_reviews     <int> 65, 0, 0, 2, 0, 19, 1, 0, 10, 4, 27, 129, 208, 3…
#> $ reviews_per_month     <dbl> 0.50, NA, NA, 0.03, NA, 0.17, 0.01, NA, 0.09, 0.…
#> $ availability_365      <int> 353, 358, 365, 323, 365, 365, 365, 365, 365, 87,…
#> $ availability_category <fct> not really on demand, not really on demand, not …
#> $ number_of_reviews_ltm <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
review <- review %>% 
  mutate(date=ymd(date))
glimpse(review)
#> Rows: 54,942
#> Columns: 4
#> $ listing_id <dbl> 35780, 145343, 145343, 145343, 145343, 145343, 145343, 1453…
#> $ date       <date> 2022-04-01, 2022-05-05, 2022-05-08, 2022-06-06, 2022-06-24…
#> $ month      <ord> April, May, May, June, June, July, July, August, August, Au…
#> $ year       <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,…

4.3 Check Missing Values

# Check Missing Values
colSums(is.na(airbnb))
#>                    id                  name               host_id 
#>                     0                     0                     0 
#>             host_name         neighbourhood              latitude 
#>                     0                     0                     0 
#>             longitude             room_type                 price 
#>                     0                     0                     0 
#>        minimum_nights     number_of_reviews     reviews_per_month 
#>                     0                     0                  5790 
#>      availability_365 availability_category number_of_reviews_ltm 
#>                     0                     0                     0
# Treatment
airbnb_clean <- replace_na(airbnb,list(reviews_per_month=0))
# Recheck
colSums(is.na(airbnb_clean))
#>                    id                  name               host_id 
#>                     0                     0                     0 
#>             host_name         neighbourhood              latitude 
#>                     0                     0                     0 
#>             longitude             room_type                 price 
#>                     0                     0                     0 
#>        minimum_nights     number_of_reviews     reviews_per_month 
#>                     0                     0                     0 
#>      availability_365 availability_category number_of_reviews_ltm 
#>                     0                     0                     0

The treatment is to replace the NA value with a value of 0. This is because if the series of number_of_reviews = 0 then the series of reviews_per_month = NA. It indicates that there is no review for the listing which lead to the result of reviews_per_month = 0.

# Check Missing Values
colSums(is.na(review))
#> listing_id       date      month       year 
#>          0          0          0          0

4.4 Check Duplicate Data

airbnb_clean %>% 
  duplicated() %>% 
  sum()
#> [1] 0
airbnb_clean[duplicated(airbnb_clean$host_id),]

5 EDA

5.1 Data’s Summary

summary(airbnb_clean)
#>        id                         name              host_id         
#>  Min.   :             27934   Length:15854       Min.   :    58920  
#>  1st Qu.:          21045092   Class :character   1st Qu.: 39744308  
#>  Median :          35037340   Mode  :character   Median :122455569  
#>  Mean   :157939679702000000                      Mean   :154105784  
#>  3rd Qu.:          52561542                      3rd Qu.:239054688  
#>  Max.   :790816217344000000                      Max.   :492665929  
#>                                                                     
#>   host_name             neighbourhood     latitude       longitude    
#>  Length:15854       Vadhana    :2153   Min.   :13.53   Min.   :100.3  
#>  Class :character   Khlong Toei:2097   1st Qu.:13.72   1st Qu.:100.5  
#>  Mode  :character   Huai Khwang:1125   Median :13.74   Median :100.6  
#>                     Ratchathewi:1114   Mean   :13.75   Mean   :100.6  
#>                     Bang Rak   : 827   3rd Qu.:13.76   3rd Qu.:100.6  
#>                     Sathon     : 809   Max.   :13.95   Max.   :100.9  
#>                     (Other)    :7729                                  
#>            room_type        price         minimum_nights    number_of_reviews
#>  Entire home/apt:8912   Min.   :      0   Min.   :   1.00   Min.   :   0.00  
#>  Hotel room     : 649   1st Qu.:    900   1st Qu.:   1.00   1st Qu.:   0.00  
#>  Private room   :5770   Median :   1429   Median :   1.00   Median :   2.00  
#>  Shared room    : 523   Mean   :   3218   Mean   :  15.29   Mean   :  16.65  
#>                         3rd Qu.:   2429   3rd Qu.:   7.00   3rd Qu.:  13.00  
#>                         Max.   :1100000   Max.   :1125.00   Max.   :1224.00  
#>                                                                              
#>  reviews_per_month availability_365          availability_category
#>  Min.   : 0.0000   Min.   :  0.0    in demand           :4181     
#>  1st Qu.: 0.0000   1st Qu.:138.0    not really on demand:8823     
#>  Median : 0.0900   Median :309.0    on the average      :2850     
#>  Mean   : 0.5162   Mean   :244.4                                  
#>  3rd Qu.: 0.6700   3rd Qu.:360.0                                  
#>  Max.   :19.1300   Max.   :365.0                                  
#>                                                                   
#>  number_of_reviews_ltm
#>  Min.   :  0.000      
#>  1st Qu.:  0.000      
#>  Median :  0.000      
#>  Mean   :  3.482      
#>  3rd Qu.:  3.000      
#>  Max.   :325.000      
#> 

Insight :
- There are 4181 properties of in demand property’s category, 8823 properties of not really on demand property’s category, and 2850 properties of on the average property’s category.
- There are 8912 Entire home/apt room type, 649 Hotel room type, 5770 Private room type, and 523 Shared room type.
- The 50% data of reviews_per_month ratio is 0.0900 and the mean is 0.5162 .

5.2 Trend Analysis

1.1. Total Review Month Trend in Year.

options(dplyr.summarise.inform = FALSE)

review_month <- review_2022 %>%
  filter(year%in%2022) %>% 
  group_by(bulan,month) %>% 
  count(month)

review_month <- review_month %>% 
  mutate(label9 = glue("
                       Month : {month}
                       Total : {comma(n)}
                       "))
plot_1 <- ggplot(data=review_month, aes(x=bulan, y=n)) + geom_line(color = "red")+geom_point(aes(text=label9))+
  scale_x_continuous(breaks = seq(1,12,1))+scale_y_continuous(labels = comma) + labs(
    title = "Total Review Month Trend in Bangkok 2022",
    x = "Month",
    y = "Total"
  )+theme(plot.background = element_rect("#78C2AD")) 
ggplotly(plot_1, tooltip = "text")

Insight :
- There was a bulish trend in February until November 2022.
- The big downtrend occurred in December 2022.
- The big uptrend occurred in October to November 2022.

1.2. Total Review per Day in a Year.

review_month <- review_2022 %>% 
  count(bulan,date,month)

review_month <- review_month %>% 
  mutate(label2 = glue("
                       Month : {month}
                       Date : {date}
                       Total : {n}
                       "))
plot_1 <- ggplot(data=review_month, aes(x=date, y=n)) + geom_line(color="orange") + geom_point(aes(text=label2)) + scale_x_date(breaks = date_breaks("months"),
  labels = date_format("%b")) + labs(
    title = "Total Review Day Trend in Bangkok 2022",
    x = "Month",
    y = "Total"
  ) +
  theme_dark()
ggplotly(plot_1, tooltip = "text")

Insight :
- There was a side-ways trend occured at the end of August until mid of October 2022.
- The down trend occured on 11 December 2022.

5.3 Ranking Analysis

2.1. Highest Total Review Ranking Based On Room Type.

# Top properties with the highest total reviews
zz <- airbnb_clean %>% 
  group_by(room_type,neighbourhood) %>% 
  summarise(Total = sum(number_of_reviews_ltm)) %>% 
  arrange(desc(Total)) %>% 
  slice(1:2)

zz <- zz %>% 
  mutate(label4=glue("Total review: {comma(Total)}"))

zz$room_type <- factor(zz$room_type, levels = c("Entire home/apt","Hotel room","Private room","Shared room"))

# GGPLOT

zz$urutan <- rank(x=zz$Total,ties.method = "first")

plot3 <- ggplot(zz,aes(x=Total, y= reorder(room_type,Total), text=label4)) +
  geom_col(aes(fill = neighbourhood,group = urutan),
           color="black",
           position = "dodge")+scale_x_continuous(labels = comma)+
  scale_fill_manual(values =c("#78C2AD", "#98B0A9","#375F6F","#364B45")) +
  labs(
    title = "Top Property - Highest Total Review",
    x = "Total Review",
    y = "Room Type",
    fill = "Region"
  ) +facet_wrap(facets = "neighbourhood",nrow = 4, scales = "free")+theme(plot.background = element_rect("#78C2AD"))

ggplotly(plot3,tooltip = "text")
  • Using Library R base.
agg_tot_rating <- aggregate(x = number_of_reviews_ltm~neighbourhood+room_type, data = airbnb_clean, FUN = sum)

# Entire home/apt
agg_21 <- agg_tot_rating[order(agg_tot_rating[agg_tot_rating$room_type == "Entire home/apt",]$number_of_reviews_ltm,decreasing = T),]
agg_21 <- head(agg_21,3)

plot_21 <- ggplot(agg_21,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "orange")+geom_text(aes(label=number_of_reviews_ltm),color="brown", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Entire home/apt" ,x = "Total Review",y="Region",fill="Total Review")
plot_21

# Hotel room
agg_22 <- agg_tot_rating[agg_tot_rating$room_type=="Hotel room",]
agg_22 <- agg_22[order(agg_22$number_of_reviews_ltm,decreasing = T),]
agg_22 <- head(agg_22,3)

plot_22 <- ggplot(agg_22,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "blue")+ scale_fill_gradient(low="red",high="orange")+geom_text(aes(label=number_of_reviews_ltm),color="black", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Hotel room" ,x = "Total Review",y="Region",fill="Total Review")
plot_22

# Private room
agg_23 <- agg_tot_rating[agg_tot_rating$room_type=="Private room",]
agg_23 <- agg_23[order(agg_23$number_of_reviews_ltm,decreasing = T),]
agg_23 <- head(agg_23,3)

plot_23 <- ggplot(agg_23,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "orange")+geom_text(aes(label=number_of_reviews_ltm),color="red", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Private room" ,x = "Total Review",y="Region",fill="Total Review")
plot_23

# Shared room
agg_24 <- agg_tot_rating[agg_tot_rating$room_type=="Shared room",]
agg_24 <- agg_24[order(agg_24$number_of_reviews_ltm,decreasing = T),]
agg_24 <- head(agg_24,3)

plot_24 <- ggplot(agg_24,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "blue")+ scale_fill_gradient(low="red",high="orange")+geom_text(aes(label=number_of_reviews_ltm),color="black", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Shared room" ,x = "Total Review",y="Region",fill="Total Review")
plot_24

Insight :
- There are 4 regions with highest total review which are Khlong Toei, Lat Krabang, Phra Nakhon, and Vadhana.
- There are 2 room type with the highest total review which are Entire home/apt in Khlong Toei and Vadhana, also Private room in Khlong Toei and Phra Nakhon.
- The highest reviewer of Hotel room is placed in Lat Krabang and Khlong Toei.
- The highest reviewer of Shared room is placed in Lat Krabang and Vadhana.

2.2. Lowest Total Review Ranking Based On Room Type.

# Lowest 10 properties with the lowest total reviews
zzl <- airbnb_clean %>% 
  group_by(room_type,neighbourhood) %>% 
  summarise(Total = sum(number_of_reviews_ltm)) %>% 
  arrange(Total) %>% 
  slice(1:10)

zzl <- zzl %>% 
  mutate(label5=glue("Total review: {Total}
                     Region: {neighbourhood}"))

zzl$room_type <- factor(zzl$room_type, levels = c("Entire home/apt","Hotel room","Private room","Shared room"))

# GGPLOT
plot4 <- ggplot(zzl,aes(x=Total, y= reorder(neighbourhood,Total), text=label5)) +
  geom_col(aes(fill = room_type),
           color="black",
           position = "dodge") + scale_fill_manual(values =c("#78C2AD", "#98B0A9","#375F6F","#364B45"))+
  labs(
    title = "Bottom Ranked Property - Lowest Total Review",
    x = "Total Review",
    y = "Region",
    fill = "Room type"
  ) +facet_wrap(facets = "room_type",nrow = 4, scales = "free") +theme(plot.background = element_rect("#78C2AD"))

ggplotly(plot4,tooltip = "text")

Insight :
- The bottom-ranked of the property’s Entire home/apt room type with the lowest review is Bang Bon, Nong Chok, Nong Khaem, Thung Khru, Thawi Watthana.
- The bottom-ranked of the property’s Hotel room type with the lowest review is Bang Khae, Bang Khen, Bang Kho laen, Bangkok Yai, Chatu Chak, Phasi Charpen, and Thon Buri.
- bottom-ranked of the property’s Private room type with the lowest review is Bang Bon, Bueng Kum, Khlong Sam Wa, Nong Chok, Nong Khaem, and Thung Khru.
- bottom-ranked of the property’s Shared room type with the lowest review is Bang Kapi, Bang Khen, Bang Na, Bang Sue, Bangkok Yai, Chom Thong, Dusit, Don Mueang, Bangkok Noi, and Bang Khun thain.

2.3. What is the average price of property in the 10 regions with the most total reviews?

agg_tot_rev <- aggregate(x = number_of_reviews_ltm ~ neighbourhood+room_type,
                         data = airbnb_clean,
                         FUN = sum)

agg_tot_rev <- agg_tot_rev[order(agg_tot_rev$number_of_reviews_ltm,decreasing = T),]
plot2.3 <- head(agg_tot_rev,10)
plot2.3 <- ggplot(plot2.3, aes(x=number_of_reviews_ltm , y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color="orange") + geom_text(aes(label=number_of_reviews_ltm),color='red',nudge_x=-1) + labs(title = "Top 10 Region with Highest Total Review",x="Total Review", y="Region",fill="Total Review")
plot2.3   

2.4. What is the average price of property in the 10 regions with the lowest total reviews?

agg_tot_rev2 <- aggregate(x = number_of_reviews_ltm ~ neighbourhood,
                         data = airbnb_clean,
                         FUN = sum)
agg_tot_rev2 <- agg_tot_rev2[order(agg_tot_rev2$number_of_reviews_ltm,decreasing = T),]
plot2.4 <- tail(agg_tot_rev2,10)
plot2.4 <- ggplot(plot2.4, aes(x=number_of_reviews_ltm , y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color="orange") + geom_text(aes(label=number_of_reviews_ltm),color='red',nudge_x=-1) + labs(title = "Top 10 Region with Lowest Total Review",x="Total Review", y="Region",fill="Total Review")
plot2.4   

5.4 Proportion

3.1. Total Property per Room Type.

df11 <- as.data.frame(table(airbnb_clean$room_type,airbnb_clean$availability_category))

colnames(df11) <- c("Room_Type","Category","Total")

plot31 <- ggplot(df11,aes(x=Total,y=Room_Type)) + 
  geom_col(aes(fill=Category),color = "black",
           position = "stack") + 
  scale_x_continuous(labels = comma)
  labs(title = "Total Property",
             subtitle = "Based on Room Type & Category",
             x = "Total",
             y = "Room Type",
             fill = "Category"
           )
#> $x
#> [1] "Total"
#> 
#> $y
#> [1] "Room Type"
#> 
#> $fill
#> [1] "Category"
#> 
#> $title
#> [1] "Total Property"
#> 
#> $subtitle
#> [1] "Based on Room Type & Category"
#> 
#> attr(,"class")
#> [1] "labels"
plot31

Insight :
- In all regions in Bangkok, The largest proportion of the property’s in demand category also on the average category have Private room type and Entire home/apt room type.

3.2. Total Property Proportion for All Region, Room Type, and Category.

# Reformat data for the sunburstR package
tot_prop_type <- airbnb_clean %>% 
  group_by(neighbourhood,room_type,availability_category) %>% 
  summarise(count = n()) %>% 
  ungroup()


# Reformat data for the sunburstR package
tot_prop_type <- tot_prop_type %>% 
  mutate(path = paste(neighbourhood, room_type, availability_category, sep="-")) %>% 
  dplyr::select(path, count)

# Plot
p <- sunburst(tot_prop_type, legend=T) 
p
Legend

Insight :
- Vadhana has many properties over there, then followed by Khlong Toei. And, both of them have big proportion of Entire home/apt also Private room type in Hongkong.
- Vadhana, Khlong Toei, and Huai Khwang has big proportion of in demand property.
- Vadhana, Khlong Toei has big proportion of not really on demand property, then followed by Huai Khwang and Ratchathewi.

5.5 Distribution

4.1. How much is the price range for each room types?

# In demand category
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>% 
  filter(availability_category == "in demand") %>% 
  group_by(room_type) %>% 
  ungroup()

boxp1 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]

plot(x = boxp1$room_type, y = boxp1$price)

# On the average category
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>% 
  filter(availability_category == "on the average") %>% 
  group_by(room_type) %>% 
  ungroup()

boxp2 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]

plot(x = boxp2$room_type, y = boxp2$price)

# Not really on demand
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>% 
  filter(availability_category == "not really on demand") %>% 
  group_by(room_type) %>% 
  ungroup()

boxp3 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]

plot(x = boxp3$room_type, y = boxp3$price)

# Combine All Property's Categories
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>% 
  filter(availability_category == "not really on demand") %>% 
  group_by(room_type) %>% 
  ungroup()

boxp3 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]

plot(x = boxp3$room_type, y = boxp3$price)

Insight :
- In the price range of Q1 = 900 , Q3 = 2429, The most of price’s data distribution is located around HK$ 1500 for all property’s categories.
- In the price range of Q1 = 900 , Q3 = 2429, The most of price’s data distribution is located in HK$ 1450 for in demand property’s category with Entire home/apt room type.
- In the price range of Q1 = 900 , Q3 = 2429, The most of price’s data distribution is located in HK$ 1771 for in demand property’s category with Hotel room type.
- In the price range of Q1 = 900 , Q3 = 2429, The most of price’s data distribution is located around HK$ 1350 - HK$ 1320 for in demand property’s category with Private and Shared room type.

4.2. How much is the price range for each property’s categories based on room type?

# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
bx <- airbnb_clean %>% 
  filter(price >= 900 & price <= 2500)

bx_plot <- ggplot(bx,aes(x=availability_category, y=price,color=room_type, fill = room_type)) + geom_boxplot(outlier.color = "red") 

ggplotly(bx_plot) %>% 
  layout(boxmode='group')
# + geom_jitter(mapping = aes(size=reviews_per_month,color=room_type))

Insight :

-The 50% price’s data is located in the Q1 - Q3, then the research uses Q1 = 900 and Q3 = 2429 from summary(airbnb_clean) in order to get new data for box plot visualizaion. Finally, conducting box plot visualization and the specific price data distribution is located in Q1 & Q3.
- The 3 categories of property with Entire home/apt room type have a similar price range, starting from HK$ 1100 - HK$ 1800.
- In the demand property, Entire/home apt room type has a range of price starting from HK$ 1,114 - HK$ 1,800 .
- In the demand property, Hotel room type has a range of price starting from HK$ 1,300.25 - HK$ 2,200 .
- In the demand property, Private room type has a range of price starting from HK$ 1,100 - HK$ 1,702 .
- In the demand property, Shared room type has a range of price starting from HK$ 1,200 - HK$ 1,350 .