1 Airbnb Reviews and Listings

Airbnb is a rental room service in major cities across the world. It is owned by Airbnb Inc. and operated through their website and smartphone applications. In the current dataset we have listings and reviews of customers and owner in 10 major cities of the world in the airbnb homes. We will analysis and eventually visualize the trend in the dataset with the help of base R and ggplot graphics. We start with importing the dataset followed by the data cleaning and visualization.

1.1 Data Cleaning

In both datasets we have character and numerical columns alongwith dates as well. At first we read the csv files using read.csv2 command, we get the data types with the help of glimpse command. Data types in the listings dataset is correct except in some columns. For instance longitude, latitude columns are read as character columns. Data types in the listings dataset is given as

# getting overview of datatypes with glimpse command
listings %>% glimpse()

and in the reviews dataset data types are

reviews %>% glimpse()

Note that chr shows character, dbl shows the double and int shows the integer data type in the results above. We will convert longitude, latitude, host_response_rate, host_acceptance_rate to numeric in listings dataset. In the reviews dataset all the columns are according to required data type except the date column.

# changing data types of some columns
listings <- listings %>%
  mutate(across(c(host_response_rate, latitude,longitude,host_acceptance_rate), as.numeric))

Similarly we have 2 columns in both dataframes which are dates. We will convert them to date format using as.Date command from data.table package.

# converting date columns to date format 
reviews$date <- as.Date(reviews$date)
listings$host_since <- as.Date(listings$host_since)

After converting the data types to our desired form we get the descriptive statistics of the datasets using the tbl_summary command from gt_summary package.

# complete cases will delete all rows with NaN values.
sub_listings <- listings %>%  .[complete.cases(.), ] %>%  dplyr::select("host_response_time","host_response_rate" ,"host_acceptance_rate", "host_is_superhost", "host_total_listings_count","host_has_profile_pic", "host_identity_verified" )

# add stat label from gt_summary package will add stat labels such median values and Interquartile range
sub_listings %>% gtsummary::tbl_summary() %>% add_stat_label()
Characteristic N = 93,322
host_response_time, n (%)
    a few days or more 5,470 (5.9%)
    within a day 12,618 (14%)
    within a few hours 17,813 (19%)
    within an hour 57,421 (62%)
host_response_rate, Median (IQR) 1.00 (0.96, 1.00)
host_acceptance_rate, Median (IQR) 0.98 (0.83, 1.00)
host_is_superhost, n (%)
    f 60,669 (65%)
    t 32,653 (35%)
host_total_listings_count, Median (IQR) 3 (1, 8)
host_has_profile_pic, n (%)
    f 88 (<0.1%)
    t 93,234 (100%)
host_identity_verified, n (%)
    f 15,440 (17%)
    t 77,882 (83%)

Results in table ?? and ?? show the summary of values for host variables and residence characteristics in the listings dataset. We have chosen these variables randomly to view the trend in the data based on the percentage of values in some of categorical variables. Overall largest listings are registered for city of Paris followed by New york. Most of the hosts rent the entire place to customers with a high percentage of >66%. It shows that airbnb hosts are mostly domestic users. Hotel rooms have the lowest ratio in the data for these 10 cities. Median accommodating guests allowed by hosts are 3 in 1 bedroom apartments.

sub_listings <- listings %>%  .[complete.cases(.), ] %>%  dplyr::select("city","room_type","accommodates","bedrooms","maximum_nights" )

sub_listings %>% gtsummary::tbl_summary() %>% add_stat_label()
Characteristic N = 93,322
city, n (%)
    Bangkok 6,107 (6.5%)
    Cape Town 9,255 (9.9%)
    Hong Kong 2,252 (2.4%)
    Istanbul 7,902 (8.5%)
    Mexico City 10,762 (12%)
    New York 11,876 (13%)
    Paris 13,690 (15%)
    Rio de Janeiro 10,995 (12%)
    Rome 11,722 (13%)
    Sydney 8,761 (9.4%)
room_type, n (%)
    Entire place 66,578 (71%)
    Hotel room 2,417 (2.6%)
    Private room 23,229 (25%)
    Shared room 1,098 (1.2%)
accommodates, Median (IQR) 3 (2, 5)
bedrooms, Median (IQR) 1 (1, 2)
maximum_nights, Median (IQR) 1,125 (90, 1,125)

With respect to the host response time most hosts reply within an hour which is very efficient time for a airbnb rental service. It can also be confirmed by the host response rate which has a median value of 1. Most of hosts are not ranked as superhosts according to our data which can is not linked to the hosts with a profile picture. Very small percentage of hosts (<0.1%) do not use the option of profile picture. Overall the hosts gets higher ratings according to our data.

In the both datasets we can check how many rows have NaN type in percentages. The results show that we have no missing values in reviews dataset while listing dataset have more than 30% missing values in some columns. Due to longevity of the all column names in listings dataset we are only printing the results for the reviews dataset here.

# to show percentage of missing values in each column we used is.na and colmeans
colMeans(is.na(reviews)) * 100
##  listing_id   review_id        date reviewer_id 
##           0           0           0           0

1.2 Hosts with multiple listings

We can check if there are any hosts which have listings across multiple cities. For this purpose we have made a function which takes into the datfarame argument. Afterwards we group by host id column and summarize the unique values in the city column. Eventually we use sapply built in base R function to return city names alongwith the count of host_ids in each city.

# a custom function where we group_by id and summarize each host by its id. in the end we add a column of city_count which uses sapply built in r function to return a vector of the length of each counted values/
hostListingAcrossCity <- function(df) {
  hostListingCityCount <- df %>%
    group_by(host_id) %>%
    summarize(city = list(unique(city))) %>%
    ungroup() %>%
    mutate(city_count = sapply(city, length))
  print(table(hostListingCityCount$city_count))
}

hostListingAcrossCity(listings)
## 
##      1      2      3      8 
## 181874    147      1      2

Our results shows that there are there are only 2 hosts which have listing across 8 cities while most of the hosts have listings in only 1 city.

1.3 Percentage of hosts ranked as superhost

A host is ranked as superhost if he has atleast 10 trips in atleast 100 nights and have 90% response. Alongwith these 2 conditions there are other conditions given on the link for host to be ranked as superhost. In our data we have a superhost categorical variable with values of t and f. Data description on the link provided shows that f stands for false and t stands for true. We have plotted the results of superhosts for our data in figure 1.1 only for the percentage of false with the label f_percent for each city. For Australian city of Sydney number of superhosts is lowest followed by Paris.

percent_superhost <- listings %>%
  filter(host_is_superhost != "") %>%
  group_by(city, host_is_superhost) %>%
  summarize(n = n()) %>%
  mutate(total = sum(n),
         percent = n / total * 100) %>%
  pivot_wider(names_from = host_is_superhost,
              values_from = percent,
              names_glue = "{host_is_superhost}_percent") %>% dplyr::select(-n)

ggplot(percent_superhost)+aes(city,f_percent)+geom_col() +
 ggnuplot::theme_gnuplot()+
  # to rotate axis labels by 45 degree we use angle - 45
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Percentage of host ranked as 'f' in super host category

Figure 1.1: Percentage of host ranked as ‘f’ in super host category

Highest the percentage of ‘f’ highest is there chance that host is denied to be labelled as superhost. Figure 1.1 also shows that Mexico city has highest percentage of superhost out of all its listings. We can analyse if this is directly linked to price or accommodates allowed by the host. In the figure 1.2 we can observe that city of Rio de jeneiro has highest prices of airbnb. City of Sydney which was not ranked high for superhosts is among the cheapest cities for the customers. Therefore, we can conclude that price and superhost label is not directly linked.

listings %>%
  filter(host_is_superhost != "") %>%
  ggplot() +
  aes(x = city, y = price, fill = host_is_superhost) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = comma_format(scale = 0.01, accuracy = 1)) +
  # scale gradient will add a gradient to the color legend
  scale_color_gradient() +
  ggnuplot::theme_gnuplot()+
  theme(legend.position = "bottom",axis.text.x = element_text(angle = 45, hjust = 1))+
  labs(x = "City", y = "Price", fill = "Superhost Status")
Prices of airbnb homes in each city linked to superhost status

Figure 1.2: Prices of airbnb homes in each city linked to superhost status

1.4 Map of the cities

We have also plotted a map of the cities under study on the world map. All the cities are shown with different dot colours. The basic purpose of the figure 1.3 was to preatice the effcicient use of the map functions in ggplot and map package. Map shows the large range of cities of airbnb around the world and its popularity as well. Map could only be plotted due to longitude and latitude data in the listings dataset.

# Create a world map
world_map <- map_data("world")

# Create a scatterplot with the latitude and longitude coordinates
ggplot(listings, aes(x = longitude, y = latitude, color = city)) +
  geom_point(size = 2) +
  # adding manual coloir to the values of dots shown for each city. 
  scale_color_manual(values = c("New York" = "red", "Paris" = "blue",Sydeny="yellow","Bangkok"="green","Cape Town"="orange","Istabul"="black","Rome"="lightblue","Mexico City"="lightyellow","Rio de Janeiro"="pink","Hong Kong"="#112446"))  +
  theme(legend.position = "bottom") +
  ggtitle("Latitude and Longitude Coordinates on a World Map") +
  # xlab and ylab for custom axis labels
  xlab("Longitude") + ylab("Latitude") +
  # alpha is used for transparency
  geom_path(data = world_map, aes(x = long, y = lat, group = group), color = "gray20", alpha = 0.5)
Prices of airbnb homes in each city linked to superhost status

Figure 1.3: Prices of airbnb homes in each city linked to superhost status

1.5 Maximum nights in airbnb

In the airbnb rentals ,it is usually upto host to allow maximum nights to the customer. In our dataset we have variable named maximum nights which takes into account the nights spent by the customer at airbnbs. It is interesting to note that there is large variation in the this variable. Range of values are shown below. Table ?? shows that median value of maximum nights lived by the customers is 1125 which shows that customers usually live at an airbnb for months in these cities. There could be several reasons for this value such as minimum booking allowed by host, price of the rental, amenities etc. One another reason could be the preference of the people when they live at the airbnb rooms due to its cheap price as compared to hotels.

range(listings$maximum_nights, na.rm = TRUE)
## [1]          1 2147483647

We have plotted the relation between the maximum nights and host label of superhost with a boxplot. Basic purpose of the figure 1.4 is to visualize the median value of maximum nights , its interquartile ranges and to visualize if there are any outliers in this relation between two variables. Figure depicts that for both categories of superhosts the median value is less than 200 nights in all room types. The 3rd quartile value is also less than 150 for all categories except for hotel rooms which shows than 75% of the customers spend less than 150 nights at the airbnbs. In the hotel rooms with non-superhost label, the maximum nights increase to >150. With regards to the relation of a host to labelled as superhost we need to take into account its value of false and true. If the host is not a superhost with label ‘f’, the median value of maximum nights is not effected in room types except when the host is renting entire place. In case of entire place renting, the median value of maximum nights is less than 1 month for a non-superhost while it changes to more than a month if the host is ranked as superhost on airbnb.

listings %>%
  # deleting rows with bedrooms more than 9 and maximum nights more than 600
 filter(bedrooms >= 
 1L & bedrooms <= 9L & !is.na(bedrooms)) %>%
  filter(maximum_nights >= 
 0L & maximum_nights <= 600) %>% 
  # deleting rows where host+profile_pic column does not have any data
  filter(!(host_has_profile_pic %in% "")) %>%
 ggplot() +
 aes(x = maximum_nights, y = host_is_superhost) +
  # using cusotm fill cplour for boxplot by hexcode
 geom_boxplot(fill = "#860446") +
 labs(y = "Host is superhost ?") +
 ggnuplot::theme_gnuplot() +
 facet_wrap(vars(room_type))+
 labs(title = "Maximum nights allowed by host in each room type",
       x = "Maximum nights",
       y = "Host is superhost ?")+
   scale_x_continuous(breaks = seq(0, 600, by = 50))
Variation of maximim nights to superhost status

Figure 1.4: Variation of maximim nights to superhost status

There are many outliers for a maximum nights spend by the customers at each room type. Most of the airbnb customers do not prefer to live at the rental places for more than a month which shows that airbnbs are used as a travel destination rental places. It also shows that airbnbs are used by the tourists. Outliers can be due to some poeple renting an airbnb place instead of a rental contract with a landlord.

listings %>%
  # filtering rows with price more than 500 and deleting rows where host_response time has no data
 filter(price <= 500 & host_response_time != "") %>%
 ggplot() +
 aes(x = price, y = host_response_time, fill = instant_bookable) +
 geom_boxplot() +
 scale_fill_brewer(palette = "Dark2", direction = 1) +
 labs(y = "Host response time", x="Price", fill = "room is instant bookable ?") +
 ggnuplot::theme_gnuplot() +
 theme(legend.position = "bottom")
Host response time linked to the facility of room as instant bookable

Figure 1.5: Host response time linked to the facility of room as instant bookable

1.6 Instant bookable airbnb and host response time

We have linked the host response time with the maximum nights in a box plot in figure 1.6. Price is plotted in x-axis and host response time on y-axis. Host response time is given as categorical variable in the data with the values of ‘in a day’ , ‘in an hour’ etc. The colour shading describes the room instant bookable option. It is important to find a relation between these variables since the host response time could be linked to the instant booking of the room. Generally customers book airbnbs on instant basis after they travel to a city so if the host takes too much time to respond the chance of a customer booking the airbnb can decrease drastically. For better understanding we filter all those rentals which have price more thna 500 dollars.

listings %>%  filter(!(host_has_profile_pic %in% "")) %>%
 ggplot() +
 aes(x = host_acceptance_rate, y = host_has_profile_pic, fill = city) +
 geom_boxplot() +
 ggnuplot::theme_gnuplot()+
 theme(legend.position = "bottom")
Prices of airbnb homes in each city linked to superhost status

Figure 1.6: Prices of airbnb homes in each city linked to superhost status


ggsave("~/Downloads/Airbnb Data/Airbnb_files/figure-latex/boxplot2-1.pdf")

Figure 1.5 shows that for both categories of instant bookable rooms median price is around 100 dollars. Maximum price for this small range of rooms can reach more than 500 dollars. In these kind of bookings, 75th percentile value varies around 250 dollars which shows that usually rooms are booked between 100 and 250 dollars. It could be interesting to link this price range with the number of nights the room has been booked. Now we will analyse the relation between host response time and the price. For all the fixed prices on airbnbs, if the host takes 1 day or more, his room is more likely to be not instant bookbale since there exists outliers in this category. Similar trend is observed for a host who responds within a day. In contrast, if the host responds within an hour or within few hours his room is more likely to be booked irrespective of his label as false in instant bookable category.

result <- listings %>% 
  group_by(city, host_since) %>% 
  summarize(count = n()) %>% 
  arrange(host_since)

ggplot(result, aes(x=host_since, y=count, color=city)) + geom_path() + 
facet_wrap(~city) + 
  ggnuplot::theme_gnuplot() + 
theme(axis.text.x = element_text(angle = 30, hjust = 1))+
scale_fill_brewer(palette = "Dark2", direction = 1) 
Number of Hosts which are registered since the years

Figure 1.7: Number of Hosts which are registered since the years

1.7 Hosts registeration on Airbnb

A number of hosts on the airbnbd are registered for number of years. There is a column in the dataset named host_since which shows the date when the host firstly registered on air bnb. Our plot in figure 1.7 shows the number of hosts which are registered since 2010. Paris had the highest hosts registered in the year 2018. According to the trend for different cities number of hosts keeps on varying in different seasons.

1.8 Host acceptance rate and profile pic

Sometimes the host refuse to put his/her profile pric on his airbnb account. It can be a reason for a customer to be susceptible about the rental place to be fake. Due to this we have plotted a boxplot between the host profile picture and the acceptance rate. Acceptance rate is in the range of 0 and 1 in the data while the host profile pic is shown by labels f and t. Host is labelled as f if he does not have a profile pic and vice versa. The colour of the boxplot shows the city names. The figure 1.7 shows that if the host does not have the profile pic, his acceptance rate median value drops as compared to a host who has a profile pic. It is important to note that many outliers exist in the category of host with a profile pic. It describes the trend of customers to give a low acceptance ranking to a host irrespective of his profile picture on the profile. The low acceptance rate for the hosts with profile pic can be due other reasons such as low check in scores and bad treatment by the host during the stay of the customer. Out of all cities, Rome and Bangkok gets the highest acceptance rate for both hosts with and without profile pictures while other cities show a mix behaviour. One reason for large discrepancies can be due to customers not putting the values in the surveys on airbnb app.

listings %>%
  group_by(city) %>%
  # summarizing number of observations by each group
  summarize(n = n()) %>%
  ggplot() +
  aes(x = city, y = n) +
  # adding count plot. similar to barplot
  geom_count(fill = "blue") +
  xlab("City") +
  ylab("Number of Listings") +
  ggtitle("Number of Listings in Each City") +
 ggnuplot::theme_gnuplot() + 
  # rotating the x axis text labels at 30 degree angle and horizontal adjusitn by 1, legend ,position = "none" will remove the legend.
theme(axis.text.x = element_text(angle = 30, hjust = 1),legend.position = "none")+
  # using custom pallete in ggplot scale package. 
scale_fill_brewer(palette = "Dark2", direction = 1) 
Number of listings in each city

Figure 1.8: Number of listings in each city

1.9 Number of listings in each city

Out of 10 cities in our study, paris has the highest listings registered on airbnb as shown in figure 1.8. New York takes the number 2 position on the list followed by Sydney. Lowest listings are registered for the city of Hong Kong.

# selecting numeric columns by name
results <- listings %>% dplyr::select(c(
 "host_response_rate"   ,       "host_acceptance_rate"       , "host_total_listings_count"  ,  "accommodates",
"bedrooms"       ,    
"price"                       ,"minimum_nights",
"maximum_nights"              ,"review_scores_rating"  ,     
"review_scores_accuracy"      ,"review_scores_cleanliness"  ,
"review_scores_checkin"      , "review_scores_communication",
"review_scores_location"    ,  "review_scores_value"))%>%  .[complete.cases(.), ]

# rounding off the correlation values to 2 decimal places
corr <- round(cor(select_if(results, is.numeric)), 2)
# change above plot for the numerical column only in listing dataset

# plotting with ggcorrplot packae. method=ciccle will show graph in cicle. size of ciel depends upon the value of corelation coefficient.
library(ggcorrplot)
ggcorrplot(corr, type = "lower", lab = TRUE,
            lab_size = 2, method = "circle")
Correlation between the numeric variables

Figure 1.9: Correlation between the numeric variables

1.10 Corelation plot

Correlation plot is used in the model validation and machine learning technqiues in STEM fields. We have plotted the correlation plot between the numerical variables in our listings dataset in figure 1.9. The values on the plot show the correlation p-values calculated by using the corr function in ggcorrplot package. The Highest correlation exists between review_Score_rating and review_Scores_accuracy with a value of 0.79. Figure shows that all the review scores have a high correlation between each other with a value > 0.5. In contrast price has a very low correlation with the number of bedrooms and accommodates allowed in a place. Similarly the price also has a very low or negative correlation with review scores which shows that price does not effect the review scores of the customers. It also shows that review scores depend upon other factors and not on the price. We can make an hypothesis that if a host treats his customer right, he can get high review rating.

listings$cities <- listings$city
listings$pricing <- round((listings$price - mean(listings$price))/sd(listings$price), 2)   # computing normalized price
listings$pricing_type <- ifelse(listings$pricing < 0, "below", "above") # adding new column for normalised price. of the price is below 0 we add label as below in the new column
listings <- listings[order(listings$pricing), ]  # sorting by the new pricing


ggplot(listings, aes(x=cities, y=pricing, label=pricing)) + 
  geom_bar(stat='identity', aes(fill=pricing_type), width=.5)  +
  # added custom fill colors
  scale_fill_manual(name="Price", 
                    labels = c("Above Average", "Below Average"), 
                    values = c("above"="#00ba38", "below"="#f9165d")) + 
  labs(title= "Diverging Bars") + 
  # coordination flip will flip x and yaxis
  coord_flip()+
  
  ggnuplot::theme_gnuplot()+
  theme(legend.position = "bottom")
Normalised price evolution in each city

Figure 1.10: Normalised price evolution in each city

1.11 Diverging bars for the normalised pricing

Price is very important factor for a customer to book an airbnb. We have plotted diverging bars plot after normalising the price with its mean value. For each city this new normalised price is plotted in th form of a bar plot with the labels of above and below average pricing in the figure 1.10. Cape Town and Bangkok have the above average prices for its airbnbs while all other cities rarely follow this trend. As compared to all major cities around the world, Paris and NewYork have the lowest prices.

# grouping by the room types in data frame and adding the percentages
df <- listings %>%
  group_by(room_type) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  mutate(percentage = round(n/sum(n)*100, 2))

# Creating a a pie chart with the percentages
pie <- ggplot(df, aes(x = "", y = n, fill = factor(room_type))) + 
  geom_bar(width = 1, stat = "identity") +
  # geom_text will label the pie chart 
  geom_text(aes(label = percent(percentage/100)), position = position_stack(vjust = 0.5)) +
  theme(axis.line = element_blank(), 
        plot.title = element_text(hjust=0.5)) + 
  labs(fill="Room Type", 
       x=NULL, 
       y=NULL)+
  # coordination polar is required to plot in 360 degree pie chart 
  coord_polar(theta = "y") + 
  ggnuplot::theme_gnuplot()

ggsave("~/Downloads/Airbnb Data/Airbnb_files/figure-html/pie1-1.pdf")


pie
Percentage of room types in airbnb listings

Figure 1.11: Percentage of room types in airbnb listings

1.12 Room types

There are 4 types of rooms in our airbnb data named as entire place, private room, hotel room and shared room. More than 65% of hosts rent their entire place on airbnb. After this dominant category, private room is rented the most on the site. There is very small difference in the number of listings for the remaining 2 categories of rooms with a value of 2.09% for hotel room and 1.74% for shared room respectively.

# creating a list of data frames from listings data.
df_list <- list(Heating =  listings %>% mutate(Heating = str_extract(amenities, "Heating")) %>% dplyr::select(Heating),
                Wifi =  listings %>% mutate(Wifi = str_extract(amenities, "Wifi")) %>% dplyr::select(Wifi),
                Kitchen =  listings %>% mutate(Kitchen = str_extract(amenities, "Kitchen")) %>% dplyr::select(Kitchen),
                Essentials =  listings %>% mutate(Essentials = str_extract(amenities, "Essentials")) %>% dplyr::select(Essentials),
                Washer =  listings %>% mutate(Washer = str_extract(amenities, "Washer")) %>% dplyr::select(Washer))

# calculating the proportion of non-NAN values for each column in each data frame with the help of a for loop
prop_non_na <- c()
for (df_name in names(df_list)) {
  prop_non_na <- c(prop_non_na, mean(!is.na(df_list[[df_name]])))
}

# calculating the percentage of non-NA values
prop_non_na_percent <- prop_non_na * 100

pie(prop_non_na_percent, labels = NULL, col = rainbow(length(df_list)))
legend_labels <- paste(names(df_list), round(prop_non_na_percent), "%", sep = " ")
legend("right", legend = legend_labels, cex = 0.8, fill = rainbow(length(df_list)),
       bty = "n", title = "Columns", xpd = TRUE, x = 0.65, y = 1.1)
Percent of each amenity in the airbnb listings.

Figure 1.12: Percent of each amenity in the airbnb listings.

1.13 Ratio of amenities provided

In the airbnb rooms, hosts provide the facilities to their customers in the form of hot water, wifi, heating etc. Usually these facilities are included in the price for the customers before the customers check in. Similarly these amenities are provided in a combination such that heating is provided with wifi, hot water and kitchen. Out of all the amenities in our data we have picked those amenities which are found the most in all cities randomly. The results are plotted in form of a pie plot in figure 1.12. We found that essentials labels is used the most for amenities in the airbnb homes alongwith wifi. Essentials are usually a check box in the app when the host is putting up his room for renting. Other hosts put in more detail for each amenity which is why we have Wifi approximately matching the percentage of essentials. Kitchen is also provided largely in the rental room with value of 86%. Overall the more than 80% of airbnb homes provide the kitchen, essentials and wifi while heating and washer are below 70%.

1.14 Conclusion

Airbnb Inc. is an international company which deals with the rental room service around the world. We have provided with the reviews and listings data from kaggle platform. We have performed a brief exploratory data analysis of the 2 datasets. Most of reviews, hosts and customer related data is in the listings dataset while reviews dataset contains the dates the reviews are written in the app and id of the customer and host. In the listings dataset there are columns with more than 30% missing values. We have analysed the data in form of different tables and graphs using the gt_summary, ggplot and base R plotting functions. For custom theme of the plot we have used a gnuplot theme from a github repo.

Analysis of the data shows following observations;

  • Hosts provide a combination of amenities to their customers in the form of washer, wifi and kitchen. Out of these amenities wifi, kitchen, essentials are most abundant in the airbnb homes.

  • Capetown and Bangkong have above average pricing as compared to other cities.

  • Usually customers rent the rooms for less than 2 months.

  • There are amny outliers in the maximum nights spent by the customers in the rooms which shows that some customers live the airbnb for longer period of times.

  • There are some renting places which have more than 15 bedrooms registered.

  • Hosts with a profile picture have higher acceptance rates.

  • Price of room is not directly linked for a host to be ranked as superhost.

  • Very small percentage of hosts (<0.1%) do not use the option of profile picture.

  • Customers generally provide reveiw rating of more than 8 on the app.

  • Most of the hosts rent in a single city.