1 INTRODUCTION

Airbnb is a very well-known community rental and booking platform for private homes which directly connect hosts and visitors. The site offers a platform for reservations between the person offering his accommodation and the vacationer who wishes to rent it. Founded in 2008 by Americans Brian Chesky and Joe Gebbia, Airbnb was valued at $ 31 billion.

In this project, by working on the dataset of Airbnb in Paris, we studied, analyzed the data in order to conduct a simple descriptive analysis. Focusing on the price of the rental, we tried to find the mechanics of price determination.

2 DATASET OVERVIEW

First, let’s take a look on the variables that we got in the Airbnb dataset listing (Table 1):

## Observations: 56,535
## Variables: 16
## $ id                             <int> 3508970, 13222966, 7337128, 576...
## $ name                           <chr> "Cosy Aptmt Bastille - Gare de ...
## $ host_id                        <int> 17667828, 12188988, 6403392, 14...
## $ host_name                      <chr> "Emmanuelle", "Sarah & Manu", "...
## $ neighbourhood_group            <lgl> NA, NA, NA, NA, NA, NA, NA, NA,...
## $ neighbourhood                  <chr> "Reuilly", "Reuilly", "Reuilly"...
## $ latitude                       <dbl> 48.84875, 48.84731, 48.84661, 4...
## $ longitude                      <dbl> 2.376042, 2.396370, 2.407639, 2...
## $ room_type                      <chr> "Entire home/apt", "Entire home...
## $ price                          <int> 80, 55, 104, 45, 35, 50, 80, 75...
## $ minimum_nights                 <int> 2, 1, 1, 3, 2, 1, 1, 4, 4, 3, 4...
## $ number_of_reviews              <int> 10, 3, 12, 13, 1, 22, 0, 7, 10,...
## $ last_review                    <chr> "2017-03-19", "2016-07-29", "20...
## $ reviews_per_month              <dbl> 0.46, 0.30, 0.58, 1.21, 0.06, 2...
## $ calculated_host_listings_count <int> 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1...
## $ availability_365               <int> 282, 0, 365, 2, 0, 76, 363, 292...
Table 1 : Dataset Overview

Our dataset contains 16 variables and 56 535 observations. Each row is the descriptions of a Airbnb rental by different modality such as: room type, price, availability, etc, including also the names and the id of hosts. By observing quickly, we can see that there are some variables which contains missing values. Also, there are some variable that we won’t need to use. Indeed, id and the host name will never influence the rental price, the same for the variabe “last review”.

3 MISSING VALUES TREATMENT AND DATA SELECTION

3.1 Data selection

In order to effectively analyse the mechanism of the price, we had done a summary on the dataset to see if there are outliers or missing values in the data. The result shown that there is a column with full of missing values (neighbourhood_group), and since we had already the variable called “neighbourhood” which contains the informations that we need, we decided to do not take into account the variable “neighbourhood_group”, as same as the variable “last_review”.

3.2 Missing Values Treatment

In our dataset, the variable “reviews_per_month” has missing values. Theses values represent about 25% of the total of observations. For that reason, we couldn’t delete all the data which have missing values because we will lose a lot of information. Instead, we used the options “na.rm = TRUE” and “na.omit()” when executing the R code.

3.3 Outliers studies

In this section, we will find out if there are extremes values in our database, and if theses values are outliers or not. For a better visualization, we went throught the boxplots (Figure 1).

Figure 1: Box plots

Through theses box plots, we can clearly see the extreme values. In the price’s boxplot, the maximum values was around 8000 euros while the minimum is 0. Logically, it is impossible to have a price of 8000 euros a night, so we decided to exclure these values which means that we will keep the data where the price is less than 3000 euros. For the extreme values in the minimum_night’s boxplot, we thought that it was potentially outliers because since “minimum_nights” means the minimum number of nights required by the host to the visitors, a rental that is required of 10 000 nights minimum to be rented is definitely nonexistent. Because of that reason, we decided to leep only thoses rentals where the minimum nights required are less than or equal to 365 (which means 1 year).

4 DATA VISUALIZATION AND ANALYSIS

4.1 Price’s disribution

Since we will focus our project on the rental price, we thought that it will be interesting to see how the prices are distributed (Figure 2). The first graph in Figure 2 shows the distribution of price among the whole dataset, where we can see that most of the prices are in range [0,300]. To deeply understanding the price distribution, we constructed the second graph that corresponds well to the range [0,300] and then the range [0,200]. Through theses graph, we concluded that the hosts tend to fix their price at a round number such as : 50, 60,..

Figure 2: Price’s Distribution

4.2 Correlation Analysis

Figure 3: Correlation Matrix

The figure above (Figure 3) is representing the correlation matrix of the numeric variables in the dataset. We exclured 2 variables : “id” and “host_id” which are also determined as numeric variables but have no meaning in our analysis.

Through this correlation matrix, we found that the price variable is most correlated with the variable “calculated_host_listings_count” (21%), and then with the variable “availability_365” (18%). Howerver, the correlations are not very strong.

4.3 Average price per geographical area

Observing the map below (Figure 4), we can see that the prices are cleary changing by neighbourhood.In the districts which is located in the center of Paris or which contains tourist monuments (Elys?e, Louvre, H?tel-de-ville), the average price are very high, in the range of 120- 140 euros. The more an apartment is located far away from Paris, the lower its price.

Figure 4: Average Price per Districts Map

4.4 Average Price per Room type and Neighbourhood

4.4.1 Room_Type Distribution

Airbnb hosts can list entire homes/apartments, private or shared rooms.As we can notice, the average price of an apartment were close to 96 euros per day but did vary a lot based on room type and the neighbourhood. The largest and most common room is “entire homes” (rent a complete apartment) and represents 85.8 % of all observations.The next group is “private rooms” how present 13.1 percent of the dataset,here you rent a room in an apartment where the host also lives.Finally “shared rooms” stand for 1 % of the observations. (Figure 5)

Figure 5: Room Distribution

4.4.2 Distribution of Price per Room type :

The rental price varies also by room type. Obviously renting an entire apartment is much more expensive than renting a room or even a shared room. The average rental price of an apartment is 102 euros while it is 39.4 euros for shared room. (Annexe 2)

4.4.3 The Differences in prices

To show the differences in prices we made a graph where we crossed the three variables price, room_type and neighbourhood (Figure 6). As we can see the observations are spread all over the city,the neighborhoods like ( Elys?e, Louvre) are the most expensive places to rent a room or home.Not surprisingly this tells us, that the location of our apartments does play a huge influence on the price. However, the prices decreased and stay so close when we leave the city center. The cheapest parts of the city are all the outer parts of Paris.

Figure 6

4.4.4 Map of offer per district distribution

We can notice that as far as we are near at the city center as much as the supply decreases (Louvre 2.09%,Elys?e 2.65%),and as far from the center the offer increases(Buttes-Montmartre 11.45 %). This logic supports the hypothesis of supply and demand in relation to price : the regions with the fewest offers have the highest prices. (Figure 7)

Figure 7

4.5 Number of reviews and price

Graph (Figure 8) shows the correlation between price and number of reviews based on the room types. We can notice, that apartments with very few reviews tends to be priced higher.

One reason for this could be: these apartments are very expensive and people tend to see the cheapest deals,or that poeple wasn’t satisfied with their stay. We also find that apartments with a lot of reviews seems to be priced lower. However, the trend is not linear, and we can see that when the number of reviews gets above 50 where the price is close to the average price,so a huge number of reviews doesn’t leads to higher prices.

Figure 8

5 CONCLUSION

We have performed graphical and statistical analysis of Airbnb in PARIS. And our findings show,the most influential features are house specifc information “room type” and house location information “neighborhood”. Those different variables plays a role on the rental price determination of the hosts. We also found that the amount of reviews do not play any significant importance for the price.

6 REFERENCES

http://r4ds.had.co.nz/explore-intro.html

https://cseweb.ucsd.edu/classes/wi17/cse258-a/reports/a052.pdf

https://rstudio-pubs-static.s3.amazonaws.com/204195_b17b4f8728ef42cba6cda1759602348d.html

7 ANNEXES

Annexe 1

Annexe 2