Introduction

Goal:


A real estate company has a niche in purchasing properties with two-bedroom to rent out short-term as part of their business model specifically within New York City. They want to have a data product with a rational conclusion to help them find out what are the zip codes that will generate the most profit on short term rentals within New York City.

Datasets:


Cost data: Zillow provides us an estimation of value for two-bedroom properties

Revenue data: AirBnB is the medium through which the investor plans to lease out their investment property.

Basic Assumptions:


The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).

The time value of money discount rate is 0% (i.e. $1 today is worth the same 100 years from now).

All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale.)

Additional Assumption:


1.The renting price and cleaning fee increases 3% per year in the period we investigate.

2.Expenses are not taken into account.

3.Assume that 60% occupany of property need cleaning.

4.Assume that the properties are available throughout a year.


The whole process will be:

Data Overview: know the size of data, the basic variables and their relationship.

Data Quality Check: check the fraud data, missing value, inconsistent data format, outliers, etc.

Data Cleaning : Solve the problem found in data quality check and clean the data.

Data Analysis : Create new features helpful to analyze the problem, visualize data, make conclusion and recommendation and think about the future work to improve the current work.

Data Overview

Column

Loading Packages & Datasets

#Load Packages
library(flexdashboard)  #For dashboard generation
library(dplyr)  #For data cleaning
library(leaflet)  #For data mapping
library(stringr)  #For dealing with string datatype
library(ggplot2)  #For data visualiztion
library(psych)  
library(kableExtra)   #For neat table


#Import dataset
zillow<-read.csv("Zip_Zhvi_2bedroom.csv",header=TRUE,na.strings = "")
airbnb<-read.csv("listings.csv",header=TRUE,na.strings = "")

Airbnb Dataset

#Dimension
dim(airbnb)
[1] 40753    95
#Overview on neighbourhood
airbnb %>% 
  group_by(neighbourhood_group_cleansed) %>% 
  summarise(unique_zipcodes = n_distinct(zipcode),
            number_properties = n())
# A tibble: 5 x 3
  neighbourhood_group_cleansed unique_zipcodes number_properties
  <fct>                                  <int>             <int>
1 Bronx                                     29               649
2 Brooklyn                                  46             16810
3 Manhattan                                 70             19212
4 Queens                                    69              3821
5 Staten Island                             12               261

Airbnb Dataset: 40753 rows and 95 columns.
Brooklyn and Manhattan have much more properties than Bronx, Quees, and Staten Island.

#Overview on Dataset
kable(head(airbnb[c("neighbourhood_group_cleansed","zipcode","calendar_updated","room_type","host_verifications","bedrooms","price","weekly_price","cleaning_fee","number_of_reviews")])) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))
neighbourhood_group_cleansed zipcode calendar_updated room_type host_verifications bedrooms price weekly_price cleaning_fee number_of_reviews
Bronx 10464 yesterday Private room [‘email’, ‘phone’, ‘reviews’, ‘jumio’] 1 $99.00 NA NA 25
Bronx 10464 6 months ago Private room [‘phone’, ‘facebook’] 1 $200.00 NA NA 0
Bronx 10464 11 months ago Entire home/apt [‘email’, ‘phone’] 3 $300.00 NA $100.00 0
Bronx 10464 2 weeks ago Entire home/apt [‘email’, ‘phone’, ‘reviews’, ‘kba’] 1 $125.00 $775.00 $75.00 12
Bronx 10464 yesterday Private room [‘email’, ‘phone’, ‘reviews’, ‘jumio’] 1 $69.00 $350.00 $17.00 86
Bronx 10464 a week ago Entire home/apt [‘email’, ‘phone’, ‘facebook’, ‘reviews’] 0 $125.00 $550.00 $35.00 41

Zillow Dataset

#Dimension
dim(zillow)
[1] 8946  262
#Overview on city and region
zillow %>% 
  summarise(unique_cities = n_distinct(City),
            unique_zipcodes = n_distinct(RegionName))
  unique_cities unique_zipcodes
1          4684            8946

Zillow Dataset: 8946 rows and 262 columns.
Zillow data cover 4684 cities while we only need New York City data.

#Dataset
kable(head(zillow[,1:15])) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))
RegionID RegionName City State Metro CountyName SizeRank X1996.04 X1996.05 X1996.06 X1996.07 X1996.08 X1996.09 X1996.10 X1996.11
61639 10025 New York NY New York New York 1 NA NA NA NA NA NA NA NA
84654 60657 Chicago IL Chicago Cook 2 167700 166400 166700 167200 166900 166900 168000 170100
61637 10023 New York NY New York New York 3 NA NA NA NA NA NA NA NA
84616 60614 Chicago IL Chicago Cook 4 195800 193500 192600 192300 192600 193600 195500 197600
93144 79936 El Paso TX El Paso El Paso 5 59100 60500 60900 60800 60300 60400 61200 61700
84640 60640 Chicago IL Chicago Cook 6 123300 122600 122000 121500 120900 120600 120900 121300

Data Quality

Column

Airbnb Dataset

Missing Value

#Select important features of 2-bedrooms property in Airbnb dataset 
airbnb2<-airbnb[c("neighbourhood_group_cleansed","zipcode","country_code","latitude","longitude","is_location_exact","bedrooms","price","weekly_price","monthly_price","cleaning_fee","number_of_reviews")]%>%filter(bedrooms==2)
#with 4892 rows

#Check missing value for important features in Airbnb dataset
missing_airbnb<-as.data.frame(sapply(airbnb2, function(x) sum(is.na (x))/nrow(airbnb2)))
names(missing_airbnb)[1]<-"missing"
missing_airbnb
                                missing
neighbourhood_group_cleansed 0.00000000
zipcode                      0.01266857
country_code                 0.00000000
latitude                     0.00000000
longitude                    0.00000000
is_location_exact            0.00000000
bedrooms                     0.00000000
price                        0.00000000
weekly_price                 0.78933388
monthly_price                0.82611361
cleaning_fee                 0.19738455
number_of_reviews            0.00000000
#Weekly_price, monthly_price both have more than 70% missing values, so we do drop these two features
airbnb2<-airbnb2[,!colnames(airbnb2) %in% c("weekly_price","monthly_price")]

For price data we only take daily price of properties because there are too many missing values in weekly_price and monthly_price.

We find zipcode and clening_fee both have small part of missing values and we will deal with these missing value later.


Outlier

airbnb2 %>% mutate(price=as.numeric(price)) %>% 
                  ggplot(aes(x = neighbourhood_group_cleansed, y = price)) +
                  geom_boxplot() +
                  labs(title = "Neighbourhood vs Price", x = "Neighbourhood", 
                       y = "Price") 

It is obvious there are outliers in Brooklyn price data, Queens price data and Manhattan price data. So we need to deal with outliers later.

Zillow Dataset

Missing Value

#Select important features in Zillow dataset
zillow2<- subset(zillow[,c(2,3,7,ncol(zillow))],zillow$City=="New York")

#Check missing value for important features in Zillow dataset
missing_zillow<-as.data.frame(sapply(zillow2, function(x) sum(is.na (x))/nrow(zillow2)))
names(missing_zillow)[1]<-"missing"
missing_zillow
           missing
RegionName       0
City             0
SizeRank         0
X2017.06         0

Here we only select the latest purchasing price because if it fit with the time of price data in airbnb dataset. The data we get from zillow is neat and the features we want have no missing values.

Data Cleaning

Column

Summary of data cleaning

(1) Airbnb Dataset:

zipcode:
zipcode should have 5 digits
Remove missing value

Price:
Remove missing value
Remove dollar sign in price

Cleaning fee:
Fill missing value with mean value Remove dollar sign in cleaning fee

others:
Filter “is_location_exact” which is true Remove outliers in every zipcode


(1) Zillow Dataset:

zipcode:
Zipcode should have 5 digits
Rename the column name from RegionName to zipcode
Rename the column name from X2017.06 to purchase_price
Filter the zipcode in New York

Price:
Make sure no missing value

Airbnb Dataset

airbnb_final<- airbnb2%>%
  mutate(price=as.numeric(price),cleaning_fee=as.numeric(gsub("\\$","",cleaning_fee)))%>%
  mutate(cleaning_fee=ifelse(is.na(cleaning_fee),mean(cleaning_fee[!is.na(cleaning_fee)]),cleaning_fee))%>%
  filter(is_location_exact=="t" & !is.na(zipcode) & (str_length(zipcode))==5)%>%
  filter(!price %in% boxplot.stats(price)$out)


kable(head(airbnb_final)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))
neighbourhood_group_cleansed zipcode country_code latitude longitude is_location_exact bedrooms price cleaning_fee number_of_reviews
Bronx 10462 US 40.85753 -73.86605 t 2 65 85.82307 4
Bronx 10469 US 40.87054 -73.84681 t 2 87 75.00000 31
Queens 11102 US 40.77172 -73.91769 t 2 151 85.82307 0
Queens 11102 US 40.77937 -73.91553 t 2 205 85.82307 0
Queens 11105 US 40.78080 -73.91025 t 2 516 400.00000 1
Queens 11105 US 40.77862 -73.92149 t 2 178 95.00000 15

Zillow Dataset

zillow_final<- zillow2%>%
  rename(zipcode=RegionName,purchase_price=X2017.06)%>%
  filter(!is.na(zipcode) & (str_length(zipcode))==5)

kable(head(zillow_final)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))
zipcode City SizeRank purchase_price
10025 New York 1 1431000
10023 New York 3 2142300
10128 New York 14 1787100
10011 New York 15 2480400
10003 New York 21 2147000
11201 New York 32 1420700

Merge Two Datasets

data<-merge(x=airbnb_final, y=zillow_final, by="zipcode")

kable(head(data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))
zipcode neighbourhood_group_cleansed country_code latitude longitude is_location_exact bedrooms price cleaning_fee number_of_reviews City SizeRank purchase_price
10003 Manhattan US 40.73797 -73.98778 t 2 412 300.00000 1 New York 21 2147000
10003 Manhattan US 40.72650 -73.98834 t 2 88 30.00000 12 New York 21 2147000
10003 Manhattan US 40.73682 -73.98336 t 2 136 75.00000 1 New York 21 2147000
10003 Manhattan US 40.72513 -73.98952 t 2 244 109.00000 86 New York 21 2147000
10003 Manhattan US 40.72917 -73.98811 t 2 162 80.00000 134 New York 21 2147000
10003 Manhattan US 40.73019 -73.98630 t 2 378 85.82307 0 New York 21 2147000

Data Analysis

Column

Distribution


From the map, we find most of properties are in Manhattan while other places are less popular. More propertities means more opportunities and more growth, while other places may not so competitive but they are much cheaper.


myPopup <- paste(sep='<br/>',
                 paste('<b>Zipcode</b>:',data$zipcode),
                 paste('<b>Neighborhood</b>:',data$neighbourhood_group_cleansed),
                 paste('<b>Rent price</b>:',data$price),
                 paste('<b>Purchase price</b>:',data$purchase_price),
                 paste('<b>Population</b>:',data$SizeRank))
leaflet() %>%addProviderTiles(providers$Esri.NatGeoWorldMap) %>%addCircleMarkers(data=data, radius=10, opacity=1, popup=myPopup,clusterOptions = markerClusterOptions())

Feature Engineering

Additional Assumption


1.The renting price and cleaning fee increases 3% per year in the period we investigate.
2.Expenses are not taken into account.
3.Assume that 60% occupany of property need cleaning.
4.Assume that the properties are available throughout a year.


Fomula

\[Expected Revenue= (DailyPrice+60%*CleaningFee) * Days* Occupancy Rate*(1-1.03^{Days/365})/(1-1.03)\]

\[ReturnRate= (ExpectedRevenue-TotalCost)/TotalCost\] \[BreakEven=log_{1.03}(1-(TotalCost(1-1.03)/ExpectedRevenue))\]


Create new features

data2<-data%>%
  group_by(zipcode) %>%
  summarise(mean_rent_price=mean(price),mean_reviews=mean(number_of_reviews),mean_cleaning_fee=mean(cleaning_fee),purchase_price=mean(purchase_price),neighbourhood=first(neighbourhood_group_cleansed),SizeRank=first(SizeRank))%>%
  mutate(occupancy_rate=0.75,
         revenue_y01=mean_rent_price*365*0.75+mean_cleaning_fee*365*0.75*0.6,
         return_rate_y01=(revenue_y01*1/(purchase_price)-1),
         return_rate_y05=(revenue_y01*(1-1.03^5)/(1-1.03))/purchase_price-1,
         return_rate_y10=(revenue_y01*(1-1.03^10)/(1-1.03))/purchase_price-1,
          return_rate_y20=(revenue_y01*(1-1.03^20)/(1-1.03))/purchase_price-1,
         return_rate_y30=(revenue_y01*(1-1.03^30)/(1-1.03))/purchase_price-1,
           break_even=(log(1-purchase_price*(1-1.03)/(revenue_y01),base=1.03)))

kable(data2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))
zipcode mean_rent_price mean_reviews mean_cleaning_fee purchase_price neighbourhood SizeRank occupancy_rate revenue_y01 return_rate_y01 return_rate_y05 return_rate_y10 return_rate_y20 return_rate_y30 break_even
10003 231.3796 26.583333 94.06417 2147000 Manhattan 21 0.75 78790.21 -0.9633022 -0.8051663 -0.5793007 -0.0139159 0.7459139 20.212421
10011 263.5057 12.206897 107.09186 2480400 Manhattan 15 0.75 89724.54 -0.9638266 -0.8079504 -0.5853123 -0.0280068 0.7209652 20.432301
10013 248.3286 18.700000 101.40835 3316500 Manhattan 1744 0.75 84636.27 -0.9744802 -0.8645122 -0.7074446 -0.3142746 0.2141130 26.296263
10014 244.3827 16.493827 95.23447 2491600 Manhattan 379 0.75 82542.03 -0.9668719 -0.8241183 -0.6202232 -0.1098349 0.5760842 21.813597
10021 147.8000 19.266667 73.88615 1815600 Manhattan 190 0.75 52596.05 -0.9710310 -0.8461998 -0.6679033 -0.2215931 0.3782105 24.046517
10022 243.9000 25.800000 101.29795 2031600 Manhattan 894 0.75 83405.81 -0.9589458 -0.7820374 -0.5293590 0.1031431 0.9531730 18.557901
10023 236.2037 13.666667 101.67094 2142300 Manhattan 3 0.75 81360.22 -0.9620220 -0.7983698 -0.5646251 0.0204824 0.8068179 19.695567
10025 205.6465 17.525253 92.38135 1431000 Manhattan 1 0.75 71469.36 -0.9500564 -0.7348424 -0.4274521 0.3420045 1.3760897 15.914944
10028 214.1379 5.137931 97.34270 2083900 Manhattan 109 0.75 74608.80 -0.9641975 -0.8099197 -0.5895646 -0.0379738 0.7033181 20.590831
10036 267.6071 25.297619 103.19038 1712900 Manhattan 580 0.75 90206.48 -0.9473370 -0.7204049 -0.3962776 0.4150749 1.5054648 15.252949
10128 208.0208 6.854167 81.83077 1787100 Manhattan 14 0.75 70386.41 -0.9606142 -0.7908953 -0.5484857 0.0583118 0.8737970 19.157631
10304 278.5000 1.000000 52.91153 328300 Staten Island 1958 0.75 84930.09 -0.7413034 0.3734554 1.9656666 5.9512746 11.3075984 3.711935
10305 90.0000 3.333333 60.13718 425100 Staten Island 2087 0.75 34515.03 -0.9188073 -0.5689367 -0.0692163 1.1816792 2.8627781 10.637754
10306 269.0000 71.500000 65.00000 352900 Staten Island 668 0.75 84315.00 -0.7610796 0.2684607 1.7389543 5.4198799 10.3667361 4.001684
10308 473.0000 32.000000 25.00000 409500 Staten Island 4149 0.75 133590.00 -0.6737729 0.7319840 2.7398282 7.7658445 14.5203902 2.976260
10312 167.0000 0.000000 85.82307 355000 Staten Island 764 0.75 59812.69 -0.8315136 -0.1054826 0.9315083 3.5272939 7.0158127 5.543713
11201 170.8254 18.206349 85.91404 1420700 Brooklyn 32 0.75 60874.83 -0.9571515 -0.7725116 -0.5087902 0.1513547 1.0385342 17.954432
11215 144.4769 14.253846 82.19266 1070800 Brooklyn 71 0.75 53050.70 -0.9504569 -0.7369692 -0.4320444 0.3312404 1.3570314 16.017441
11217 165.6790 23.135803 88.92877 1302300 Brooklyn 1555 0.75 59961.18 -0.9539575 -0.7555540 -0.4721740 0.2371799 1.1904923 16.973871
11231 166.1719 16.437500 89.21682 1202900 Brooklyn 1817 0.75 60143.41 -0.9500013 -0.7345502 -0.4268211 0.3434833 1.3787081 15.900966
11234 156.2500 19.500000 80.41153 476900 Brooklyn 52 0.75 55981.03 -0.8826147 -0.3767857 0.3456905 2.1541860 4.5846527 7.699521
11434 335.0000 16.000000 46.10769 382300 Queens 622 0.75 99279.44 -0.7403101 0.3787288 1.9770533 5.9779641 11.3548536 3.698486

Analysis and Conclusion

Summary:

  1. Although Staten Island has highest return ratios and shortest breakeven periods, the number of potential customers is much less than that of Manhattan and Brooklyn.

  2. Although Manhattan has highest cost in properties, and much higher break-even period than other places but there are still some place worth to invest. The price of real estate is more likely to increase and bigger populations density means potentially higher occupancy rate.


Recommendation:

I would recommend place with zipcode of 10025 in manhattan and zipcode of 11234 in Brooklyn. The reason is as follows:

Population: These two places both have very high population size rank compared to other zipcodes.

Break-even period: These two places are in top-10 in terms of average break-even period and they are no more than 20 years.

Return rate: In ten years, the zipcode of 11234 could have a positive return rate of 30% while that of 10025 will be increasing more quickly than most places in Manhattan.

Total cost: Both of these two places are the cheapest region on Mahattan and Brooklyn. So the company could better control the budget on this investment.

Reviews: The number of reviews of these two places are between the middle and high level, which means people are more willing to live here than in many other places in Manhattan and Brooklyn.

Future Work


In the future, we need to investigate:

  1. How does the macroeconomic environment influence the return rate of real estate, for example, change in number of tourists?

  2. Consider about the future change for the current investment, for example, if the company want to sell its properties at certain time in the future. In this case, we need to consider more about financial variables such as the increase in the value of real estate.

  3. Consider about other expenses that the investment will bring, for example, tax, maintainance cost, marketing cost, etc.

  4. Will buying more properties have an economic scale effect? This could bring the revenue to grow up.

  5. What is the company’s market strategy? Does it want to target at high price market with smaller sales volume or target at low price market with bigger sales volume?

Column

Breakeven

breakeven<-data2%>%
  ggplot(aes(x = reorder(zipcode, break_even), y = break_even,fill=neighbourhood)) +
  geom_bar(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Average Breakeven Period vs Zipcodes",x = "Zipcode", y = "Breakeven Period")+
  theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

breakeven

Return_rate

revenue_year20<-
  ggplot(data=data2,aes(x = zipcode, y = return_rate_y10,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Return rate over 10 years vs Zipcodes",x = "Zipcode", y = "Return rate")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()
revenue_year20

Revenue

revenue_year<-
  ggplot(data=data2,aes(x = zipcode, y = revenue_y01,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Average Revenue/year vs Zipcodes",x = "Zipcode", y = "Average Revenue")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

revenue_year

Total_cost

Total_cost<-data2%>%
  ggplot(aes(x = zipcode, y = purchase_price,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Total_cost vs Zipcodes",x = "Zipcode", y = "Total_cost")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

Total_cost

Population

sizerank<-data2%>%
  ggplot(aes(x = zipcode, y = SizeRank,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Population Rank vs Zipcodes",x = "Zipcode", y = "Population Rank")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

sizerank

Review

number_review<- data2%>%
  ggplot(aes(x = zipcode, y = mean_reviews,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Number of reviews vs Zipcodes",x = "Zipcode", y = "Number of Reviews")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

number_review