This research focuses on anlyzing the airbnb Bangkok in 2022. Then, there are several limitations as well as the limitation of data source, further information related to the description of each variable, the property’s category only based on availability_365 not from total visitors due to limited data sources, and the availability_365 variable needs to be updated.
Based on analysis the total review trend was a bulish trend in 2022, but there was a down trend in Nov - Dec 2022 so that the company has to learn also develop strategy for the future. Next, the company should aware and educate to the owner’s property which located in Bang Bon, Nong Chok, Nong Khaem, Thung Khru, Thawi Watthana for Entire home/apt room type also Bang Bon, Bueng Kum, Khlong Sam Wa, Nong Chok, Nong Khaem, and Thung Khru for Private room type due to the Entire home/apt room type and Private room type have big in demand property proportion in Bangkok. And also, Vadhana, Khlong Toei has big proportion of not really on demand property, then followed by Huai Khwang and Ratchathewi so that the company has to encourage and support them to get more loyal customers.Finally, The 50% of price’s data distribution is located in the range of price starting from HK$ 1,100 - HK$ 1,800 for the biggest proportion of in demand property (especially for Entire home/apt room type and Private room type) in Bangkok 2022.
Airbnb, Inc. is an American San Francisco-based company operating an online marketplace for short-term homestays and experiences. The company acts as a broker and charges a commission from each booking. The company was founded in 2008 by Brian Chesky, Nathan Blecharczyk, and Joe Gebbia.
[Airbnb_Bangkok Dataset] (http://insideairbnb.com/get-the-data/)
| No. | Feature | Description |
|---|---|---|
| 1. | id | Airbnb’s unique identifier for the listing |
| 2. | name | Name of the listing |
| 3. | host_id | Airbnb’s unique identifier for the host/user |
| 4. | host_name | Name of the host. Usually just the first name |
| 5. | neighbourhood | The listing’s region |
| 6. | latitude | Uses the World Geodetic System (WGS84) projection for latitude |
| 7. | longitude | Uses the World Geodetic System (WGS84) projection for longitude |
| 8. | room_type | The listing’s room type: Entire home/apt, Hotel room, Private room, Shared room |
| 9. | price | Daily price in HK$ (1HK$ = Rp. 1,932.92) |
| 10. | minimum_nights | Minimum number of night stay for the listing (calendar rules may be different) |
| 11. | number_of_reviews | The number of reviews the listing has |
| 12. | reviews_per_month | The number of reviews the listing has over the lifetime of the listing |
| 13. | availability_365 | The length of time the building has not been rented in a year |
| 14. | availability_category | The list’s category based on the length of time the building has not been rented in a year |
| 15. | number_of_reviews_ltm | The number of reviews the listing has (in the last 12 months) |
# Data Cleaning
library(readr)
library(tidyverse)
library(dplyr)
library(lubridate)
library(glue)
library(ggplot2)
library(plotly)
library(scales)
library(leaflet)
library(treemap)
library(sunburstR)airbnb <- read.csv("data_input/listings.csv",encoding= "latin1")
airbnb <- as.data.frame(airbnb)
airbnbreview <- read.csv("data_input/reviews_bangkok.csv")
reviewIn this part, this research will use 15 variable in accordance with the needs of research.
airbnb <- airbnb %>%
select(-c(5,13,15,18))
head(airbnb,200)tail(airbnb,200)# Subsetting for year = 2022
review <- review %>%
filter(year(ymd(date))%in%2022)
head(review)# review in 2022
review_2022 <- review %>%
filter(year(ymd(date))%in%2022)
# cahnge data type - date
review <- review %>%
mutate(date=as_date(date))
review_2022 <- review_2022 %>%
mutate(date=as_date(date))
#Parse date
# Month
review <- review %>%
mutate(month = month(date,label = T, abbr = F))
review_2022 <- review_2022 %>%
mutate(b=month(date,label=T))
review_2022 <- review_2022 %>%
mutate(month = month(date,label = T, abbr = F))
review_2022 <- review_2022 %>%
mutate(bulan = month(date,label = F))
# Year
review <- review %>%
mutate(year = year(date))
review_2022 <- review_2022 %>%
mutate(year = year(date))Creating a function to categorize availability_365 variable.
# Defined a function
convert_availability <- function(y){
if(y <= 146)
{
y <- "in demand"
}
else
if(y>147 & y<270)
{
y <- "on the average"
}
else
{
y <- "not really on demand"
}
}
# Implementation
airbnb$availability_category <- sapply(X = airbnb$availability_365, FUN = convert_availability)
# Change a column series position
airbnb <- airbnb %>%
relocate(availability_category,.after = availability_365)
# Check dataframe
head(airbnb)glimpse(airbnb)#> Rows: 15,854
#> Columns: 15
#> $ id <dbl> 27934, 27979, 28745, 35780, 941865, 1704776, 487…
#> $ name <chr> "Nice room with superb city view", "Easy going l…
#> $ host_id <int> 120437, 120541, 123784, 153730, 610315, 2129668,…
#> $ host_name <chr> "Nuttee", "Emy", "Familyroom", "Sirilak", "Kasem…
#> $ neighbourhood <chr> "Ratchathewi", "Bang Na", "Bang Kapi", "Din Daen…
#> $ latitude <dbl> 13.75983, 13.66818, 13.75232, 13.78823, 13.76872…
#> $ longitude <dbl> 100.5413, 100.6167, 100.6240, 100.5726, 100.6334…
#> $ room_type <chr> "Entire home/apt", "Private room", "Private room…
#> $ price <int> 1905, 1316, 800, 1286, 1905, 1000, 1558, 1461, 1…
#> $ minimum_nights <int> 3, 1, 60, 7, 1, 250, 3, 1, 3, 2, 2, 15, 2, 2, 30…
#> $ number_of_reviews <int> 65, 0, 0, 2, 0, 19, 1, 0, 10, 4, 27, 129, 208, 3…
#> $ reviews_per_month <dbl> 0.50, NA, NA, 0.03, NA, 0.17, 0.01, NA, 0.09, 0.…
#> $ availability_365 <int> 353, 358, 365, 323, 365, 365, 365, 365, 365, 87,…
#> $ availability_category <chr> "not really on demand", "not really on demand", …
#> $ number_of_reviews_ltm <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
glimpse(review)#> Rows: 54,942
#> Columns: 4
#> $ listing_id <dbl> 35780, 145343, 145343, 145343, 145343, 145343, 145343, 1453…
#> $ date <date> 2022-04-01, 2022-05-05, 2022-05-08, 2022-06-06, 2022-06-24…
#> $ month <ord> April, May, May, June, June, July, July, August, August, Au…
#> $ year <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,…
airbnb <- airbnb %>%
mutate(neighbourhood = as.factor(neighbourhood),
room_type = as.factor(room_type),
availability_category = as.factor(availability_category))
glimpse(airbnb)#> Rows: 15,854
#> Columns: 15
#> $ id <dbl> 27934, 27979, 28745, 35780, 941865, 1704776, 487…
#> $ name <chr> "Nice room with superb city view", "Easy going l…
#> $ host_id <int> 120437, 120541, 123784, 153730, 610315, 2129668,…
#> $ host_name <chr> "Nuttee", "Emy", "Familyroom", "Sirilak", "Kasem…
#> $ neighbourhood <fct> Ratchathewi, Bang Na, Bang Kapi, Din Daeng, Bang…
#> $ latitude <dbl> 13.75983, 13.66818, 13.75232, 13.78823, 13.76872…
#> $ longitude <dbl> 100.5413, 100.6167, 100.6240, 100.5726, 100.6334…
#> $ room_type <fct> Entire home/apt, Private room, Private room, Pri…
#> $ price <int> 1905, 1316, 800, 1286, 1905, 1000, 1558, 1461, 1…
#> $ minimum_nights <int> 3, 1, 60, 7, 1, 250, 3, 1, 3, 2, 2, 15, 2, 2, 30…
#> $ number_of_reviews <int> 65, 0, 0, 2, 0, 19, 1, 0, 10, 4, 27, 129, 208, 3…
#> $ reviews_per_month <dbl> 0.50, NA, NA, 0.03, NA, 0.17, 0.01, NA, 0.09, 0.…
#> $ availability_365 <int> 353, 358, 365, 323, 365, 365, 365, 365, 365, 87,…
#> $ availability_category <fct> not really on demand, not really on demand, not …
#> $ number_of_reviews_ltm <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
review <- review %>%
mutate(date=ymd(date))
glimpse(review)#> Rows: 54,942
#> Columns: 4
#> $ listing_id <dbl> 35780, 145343, 145343, 145343, 145343, 145343, 145343, 1453…
#> $ date <date> 2022-04-01, 2022-05-05, 2022-05-08, 2022-06-06, 2022-06-24…
#> $ month <ord> April, May, May, June, June, July, July, August, August, Au…
#> $ year <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,…
# Check Missing Values
colSums(is.na(airbnb))#> id name host_id
#> 0 0 0
#> host_name neighbourhood latitude
#> 0 0 0
#> longitude room_type price
#> 0 0 0
#> minimum_nights number_of_reviews reviews_per_month
#> 0 0 5790
#> availability_365 availability_category number_of_reviews_ltm
#> 0 0 0
# Treatment
airbnb_clean <- replace_na(airbnb,list(reviews_per_month=0))
# Recheck
colSums(is.na(airbnb_clean))#> id name host_id
#> 0 0 0
#> host_name neighbourhood latitude
#> 0 0 0
#> longitude room_type price
#> 0 0 0
#> minimum_nights number_of_reviews reviews_per_month
#> 0 0 0
#> availability_365 availability_category number_of_reviews_ltm
#> 0 0 0
The treatment is to replace the NA value with a value of 0. This is because if the series of number_of_reviews = 0 then the series of reviews_per_month = NA. It indicates that there is no review for the listing which lead to the result of reviews_per_month = 0.
# Check Missing Values
colSums(is.na(review))#> listing_id date month year
#> 0 0 0 0
airbnb_clean %>%
duplicated() %>%
sum()#> [1] 0
airbnb_clean[duplicated(airbnb_clean$host_id),]summary(airbnb_clean)#> id name host_id
#> Min. : 27934 Length:15854 Min. : 58920
#> 1st Qu.: 21045092 Class :character 1st Qu.: 39744308
#> Median : 35037340 Mode :character Median :122455569
#> Mean :157939679702000000 Mean :154105784
#> 3rd Qu.: 52561542 3rd Qu.:239054688
#> Max. :790816217344000000 Max. :492665929
#>
#> host_name neighbourhood latitude longitude
#> Length:15854 Vadhana :2153 Min. :13.53 Min. :100.3
#> Class :character Khlong Toei:2097 1st Qu.:13.72 1st Qu.:100.5
#> Mode :character Huai Khwang:1125 Median :13.74 Median :100.6
#> Ratchathewi:1114 Mean :13.75 Mean :100.6
#> Bang Rak : 827 3rd Qu.:13.76 3rd Qu.:100.6
#> Sathon : 809 Max. :13.95 Max. :100.9
#> (Other) :7729
#> room_type price minimum_nights number_of_reviews
#> Entire home/apt:8912 Min. : 0 Min. : 1.00 Min. : 0.00
#> Hotel room : 649 1st Qu.: 900 1st Qu.: 1.00 1st Qu.: 0.00
#> Private room :5770 Median : 1429 Median : 1.00 Median : 2.00
#> Shared room : 523 Mean : 3218 Mean : 15.29 Mean : 16.65
#> 3rd Qu.: 2429 3rd Qu.: 7.00 3rd Qu.: 13.00
#> Max. :1100000 Max. :1125.00 Max. :1224.00
#>
#> reviews_per_month availability_365 availability_category
#> Min. : 0.0000 Min. : 0.0 in demand :4181
#> 1st Qu.: 0.0000 1st Qu.:138.0 not really on demand:8823
#> Median : 0.0900 Median :309.0 on the average :2850
#> Mean : 0.5162 Mean :244.4
#> 3rd Qu.: 0.6700 3rd Qu.:360.0
#> Max. :19.1300 Max. :365.0
#>
#> number_of_reviews_ltm
#> Min. : 0.000
#> 1st Qu.: 0.000
#> Median : 0.000
#> Mean : 3.482
#> 3rd Qu.: 3.000
#> Max. :325.000
#>
Insight :
- There are 4181 properties of in
demand property’s category, 8823 properties of not really on demand
property’s category, and 2850 properties of on the average property’s
category.
- There are 8912 Entire home/apt room type, 649 Hotel
room type, 5770 Private room type, and 523 Shared room type.
- The
50% data of reviews_per_month ratio is 0.0900 and the mean is 0.5162
.
1.1. Total Review Month Trend in Year.
options(dplyr.summarise.inform = FALSE)
review_month <- review_2022 %>%
filter(year%in%2022) %>%
group_by(bulan,month) %>%
count(month)
review_month <- review_month %>%
mutate(label9 = glue("
Month : {month}
Total : {comma(n)}
"))
plot_1 <- ggplot(data=review_month, aes(x=bulan, y=n)) + geom_line(color = "red")+geom_point(aes(text=label9))+
scale_x_continuous(breaks = seq(1,12,1))+scale_y_continuous(labels = comma) + labs(
title = "Total Review Month Trend in Bangkok 2022",
x = "Month",
y = "Total"
)+theme(plot.background = element_rect("#78C2AD"))
ggplotly(plot_1, tooltip = "text")Insight :
- There was a bulish trend in
February until November 2022.
- The big downtrend occurred in
December 2022.
- The big uptrend occurred in October to November
2022.
1.2. Total Review per Day in a Year.
review_month <- review_2022 %>%
count(bulan,date,month)
review_month <- review_month %>%
mutate(label2 = glue("
Month : {month}
Date : {date}
Total : {n}
"))
plot_1 <- ggplot(data=review_month, aes(x=date, y=n)) + geom_line(color="orange") + geom_point(aes(text=label2)) + scale_x_date(breaks = date_breaks("months"),
labels = date_format("%b")) + labs(
title = "Total Review Day Trend in Bangkok 2022",
x = "Month",
y = "Total"
) +
theme_dark()
ggplotly(plot_1, tooltip = "text")Insight :
- There was a side-ways trend occured
at the end of August until mid of October 2022.
- The down trend
occured on 11 December 2022.
2.1. Highest Total Review Ranking Based On Room Type.
# Top properties with the highest total reviews
zz <- airbnb_clean %>%
group_by(room_type,neighbourhood) %>%
summarise(Total = sum(number_of_reviews_ltm)) %>%
arrange(desc(Total)) %>%
slice(1:2)
zz <- zz %>%
mutate(label4=glue("Total review: {comma(Total)}"))
zz$room_type <- factor(zz$room_type, levels = c("Entire home/apt","Hotel room","Private room","Shared room"))
# GGPLOT
zz$urutan <- rank(x=zz$Total,ties.method = "first")
plot3 <- ggplot(zz,aes(x=Total, y= reorder(room_type,Total), text=label4)) +
geom_col(aes(fill = neighbourhood,group = urutan),
color="black",
position = "dodge")+scale_x_continuous(labels = comma)+
scale_fill_manual(values =c("#78C2AD", "#98B0A9","#375F6F","#364B45")) +
labs(
title = "Top Property - Highest Total Review",
x = "Total Review",
y = "Room Type",
fill = "Region"
) +facet_wrap(facets = "neighbourhood",nrow = 4, scales = "free")+theme(plot.background = element_rect("#78C2AD"))
ggplotly(plot3,tooltip = "text")agg_tot_rating <- aggregate(x = number_of_reviews_ltm~neighbourhood+room_type, data = airbnb_clean, FUN = sum)
# Entire home/apt
agg_21 <- agg_tot_rating[order(agg_tot_rating[agg_tot_rating$room_type == "Entire home/apt",]$number_of_reviews_ltm,decreasing = T),]
agg_21 <- head(agg_21,3)
plot_21 <- ggplot(agg_21,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "orange")+geom_text(aes(label=number_of_reviews_ltm),color="brown", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Entire home/apt" ,x = "Total Review",y="Region",fill="Total Review")
plot_21# Hotel room
agg_22 <- agg_tot_rating[agg_tot_rating$room_type=="Hotel room",]
agg_22 <- agg_22[order(agg_22$number_of_reviews_ltm,decreasing = T),]
agg_22 <- head(agg_22,3)
plot_22 <- ggplot(agg_22,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "blue")+ scale_fill_gradient(low="red",high="orange")+geom_text(aes(label=number_of_reviews_ltm),color="black", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Hotel room" ,x = "Total Review",y="Region",fill="Total Review")
plot_22# Private room
agg_23 <- agg_tot_rating[agg_tot_rating$room_type=="Private room",]
agg_23 <- agg_23[order(agg_23$number_of_reviews_ltm,decreasing = T),]
agg_23 <- head(agg_23,3)
plot_23 <- ggplot(agg_23,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "orange")+geom_text(aes(label=number_of_reviews_ltm),color="red", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Private room" ,x = "Total Review",y="Region",fill="Total Review")
plot_23# Shared room
agg_24 <- agg_tot_rating[agg_tot_rating$room_type=="Shared room",]
agg_24 <- agg_24[order(agg_24$number_of_reviews_ltm,decreasing = T),]
agg_24 <- head(agg_24,3)
plot_24 <- ggplot(agg_24,aes(x=number_of_reviews_ltm,y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color = "blue")+ scale_fill_gradient(low="red",high="orange")+geom_text(aes(label=number_of_reviews_ltm),color="black", nudge_x = -2) + labs(title= "Top 3 Region with High Total Review",subtitle = "Room Type : Shared room" ,x = "Total Review",y="Region",fill="Total Review")
plot_24Insight :
- There are 4 regions with highest
total review which are Khlong Toei, Lat Krabang, Phra Nakhon, and
Vadhana.
- There are 2 room type with the highest total review
which are Entire home/apt in Khlong Toei and Vadhana, also Private room
in Khlong Toei and Phra Nakhon.
- The highest reviewer of Hotel
room is placed in Lat Krabang and Khlong Toei.
- The highest
reviewer of Shared room is placed in Lat Krabang and Vadhana.
2.2. Lowest Total Review Ranking Based On Room Type.
# Lowest 10 properties with the lowest total reviews
zzl <- airbnb_clean %>%
group_by(room_type,neighbourhood) %>%
summarise(Total = sum(number_of_reviews_ltm)) %>%
arrange(Total) %>%
slice(1:10)
zzl <- zzl %>%
mutate(label5=glue("Total review: {Total}
Region: {neighbourhood}"))
zzl$room_type <- factor(zzl$room_type, levels = c("Entire home/apt","Hotel room","Private room","Shared room"))
# GGPLOT
plot4 <- ggplot(zzl,aes(x=Total, y= reorder(neighbourhood,Total), text=label5)) +
geom_col(aes(fill = room_type),
color="black",
position = "dodge") + scale_fill_manual(values =c("#78C2AD", "#98B0A9","#375F6F","#364B45"))+
labs(
title = "Bottom Ranked Property - Lowest Total Review",
x = "Total Review",
y = "Region",
fill = "Room type"
) +facet_wrap(facets = "room_type",nrow = 4, scales = "free") +theme(plot.background = element_rect("#78C2AD"))
ggplotly(plot4,tooltip = "text")Insight :
- The bottom-ranked of the property’s
Entire home/apt room type with the lowest review is Bang Bon, Nong Chok,
Nong Khaem, Thung Khru, Thawi Watthana.
- The bottom-ranked of the
property’s Hotel room type with the lowest review is Bang Khae, Bang
Khen, Bang Kho laen, Bangkok Yai, Chatu Chak, Phasi Charpen, and Thon
Buri.
- bottom-ranked of the property’s Private room type with the
lowest review is Bang Bon, Bueng Kum, Khlong Sam Wa, Nong Chok, Nong
Khaem, and Thung Khru.
- bottom-ranked of the property’s Shared
room type with the lowest review is Bang Kapi, Bang Khen, Bang Na, Bang
Sue, Bangkok Yai, Chom Thong, Dusit, Don Mueang, Bangkok Noi, and Bang
Khun thain.
2.3. What is the average price of property in the 10 regions with the most total reviews?
agg_tot_rev <- aggregate(x = number_of_reviews_ltm ~ neighbourhood+room_type,
data = airbnb_clean,
FUN = sum)
agg_tot_rev <- agg_tot_rev[order(agg_tot_rev$number_of_reviews_ltm,decreasing = T),]
plot2.3 <- head(agg_tot_rev,10)
plot2.3 <- ggplot(plot2.3, aes(x=number_of_reviews_ltm , y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color="orange") + geom_text(aes(label=number_of_reviews_ltm),color='red',nudge_x=-1) + labs(title = "Top 10 Region with Highest Total Review",x="Total Review", y="Region",fill="Total Review")
plot2.3 2.4. What is the average price of property in the 10 regions with the lowest total reviews?
agg_tot_rev2 <- aggregate(x = number_of_reviews_ltm ~ neighbourhood,
data = airbnb_clean,
FUN = sum)
agg_tot_rev2 <- agg_tot_rev2[order(agg_tot_rev2$number_of_reviews_ltm,decreasing = T),]
plot2.4 <- tail(agg_tot_rev2,10)
plot2.4 <- ggplot(plot2.4, aes(x=number_of_reviews_ltm , y=reorder(neighbourhood,number_of_reviews_ltm))) + geom_col(aes(fill=number_of_reviews_ltm),color="orange") + geom_text(aes(label=number_of_reviews_ltm),color='red',nudge_x=-1) + labs(title = "Top 10 Region with Lowest Total Review",x="Total Review", y="Region",fill="Total Review")
plot2.4 3.1. Total Property per Room Type.
df11 <- as.data.frame(table(airbnb_clean$room_type,airbnb_clean$availability_category))
colnames(df11) <- c("Room_Type","Category","Total")
plot31 <- ggplot(df11,aes(x=Total,y=Room_Type)) +
geom_col(aes(fill=Category),color = "black",
position = "stack") +
scale_x_continuous(labels = comma)
labs(title = "Total Property",
subtitle = "Based on Room Type & Category",
x = "Total",
y = "Room Type",
fill = "Category"
)#> $x
#> [1] "Total"
#>
#> $y
#> [1] "Room Type"
#>
#> $fill
#> [1] "Category"
#>
#> $title
#> [1] "Total Property"
#>
#> $subtitle
#> [1] "Based on Room Type & Category"
#>
#> attr(,"class")
#> [1] "labels"
plot31
Insight :
- In all regions in Bangkok, The largest
proportion of the property’s in demand category also on the average
category have Private room type and Entire home/apt room type.
3.2. Total Property Proportion for All Region, Room Type, and Category.
# Reformat data for the sunburstR package
tot_prop_type <- airbnb_clean %>%
group_by(neighbourhood,room_type,availability_category) %>%
summarise(count = n()) %>%
ungroup()
# Reformat data for the sunburstR package
tot_prop_type <- tot_prop_type %>%
mutate(path = paste(neighbourhood, room_type, availability_category, sep="-")) %>%
dplyr::select(path, count)
# Plot
p <- sunburst(tot_prop_type, legend=T)
pInsight :
- Vadhana has many properties over
there, then followed by Khlong Toei. And, both of them have big
proportion of Entire home/apt also Private room type in Hongkong.
-
Vadhana, Khlong Toei, and Huai Khwang has big proportion of in demand
property.
- Vadhana, Khlong Toei has big proportion of not really
on demand property, then followed by Huai Khwang and Ratchathewi.
4.1. How much is the price range for each room types?
# In demand category
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>%
filter(availability_category == "in demand") %>%
group_by(room_type) %>%
ungroup()
boxp1 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]
plot(x = boxp1$room_type, y = boxp1$price)# On the average category
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>%
filter(availability_category == "on the average") %>%
group_by(room_type) %>%
ungroup()
boxp2 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]
plot(x = boxp2$room_type, y = boxp2$price)# Not really on demand
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>%
filter(availability_category == "not really on demand") %>%
group_by(room_type) %>%
ungroup()
boxp3 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]
plot(x = boxp3$room_type, y = boxp3$price)# Combine All Property's Categories
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
mean_harga <- airbnb_clean %>%
filter(availability_category == "not really on demand") %>%
group_by(room_type) %>%
ungroup()
boxp3 <- mean_harga[(mean_harga$price >= 900 & mean_harga$price <= 2500), ]
plot(x = boxp3$room_type, y = boxp3$price)Insight :
- In the price range of Q1 = 900 , Q3
= 2429, The most of price’s data distribution is located around HK$ 1500
for all property’s categories.
- In the price range of Q1 = 900 ,
Q3 = 2429, The most of price’s data distribution is located in HK$ 1450
for in demand property’s category with Entire home/apt room type.
-
In the price range of Q1 = 900 , Q3 = 2429, The most of price’s data
distribution is located in HK$ 1771 for in demand property’s category
with Hotel room type.
- In the price range of Q1 = 900 , Q3 = 2429,
The most of price’s data distribution is located around HK$ 1350 - HK$
1320 for in demand property’s category with Private and Shared room
type.
4.2. How much is the price range for each property’s categories based on room type?
# Price -> Q1 = 900 , Q3 = 2429 , 50% data of price is located in the range of Q1 - Q3
bx <- airbnb_clean %>%
filter(price >= 900 & price <= 2500)
bx_plot <- ggplot(bx,aes(x=availability_category, y=price,color=room_type, fill = room_type)) + geom_boxplot(outlier.color = "red")
ggplotly(bx_plot) %>%
layout(boxmode='group')# + geom_jitter(mapping = aes(size=reviews_per_month,color=room_type))Insight :
-The 50% price’s data is located in the Q1 - Q3, then the research
uses Q1 = 900 and Q3 = 2429 from summary(airbnb_clean) in
order to get new data for box plot visualizaion. Finally, conducting box
plot visualization and the specific price data distribution is located
in Q1 & Q3.
- The 3 categories of property with Entire home/apt
room type have a similar price range, starting from HK$ 1100 - HK$ 1800.
- In the demand property, Entire/home apt room type has a range of
price starting from HK$ 1,114 - HK$ 1,800 .
- In the demand
property, Hotel room type has a range of price starting from HK$
1,300.25 - HK$ 2,200 .
- In the demand property, Private room type
has a range of price starting from HK$ 1,100 - HK$ 1,702 .
- In the
demand property, Shared room type has a range of price starting from HK$
1,200 - HK$ 1,350 .