Introduction

Goal:

A real estate company has a niche in purchasing properties with two-bedroom to rent out short-term as part of their business model specifically within New York City. They want to have a data product with a rational conclusion to help them find out what are the zip codes that will generate the most profit on short term rentals within New York City.

Datasets:

Cost data: Zillow provides us an estimation of value for two-bedroom properties

Revenue data: AirBnB is the medium through which the investor plans to lease out their investment property.

Basic Assumptions:

The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).

The time value of money discount rate is 0% (i.e. $1 today is worth the same 100 years from now).

All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale.)

Additional Assumption:

1.The renting price and cleaning fee increases 3% per year in the period we investigate.

2.Expenses are not taken into account.

3.Assume that 60% occupany of property need cleaning.

4.Assume that the properties are available throughout a year.

The whole process will be:

Data Overview: know the size of data, the basic variables and their relationship.

Data Quality Check: check the fraud data, missing value, inconsistent data format, outliers, etc.

Data Cleaning : Solve the problem found in data quality check and clean the data.

Data Analysis : Create new features helpful to analyze the problem, visualize data, make conclusion and recommendation and think about the future work to improve the current work.

Data Overview

Column

Loading Packages & Datasets

#Load Packages
library(flexdashboard)  #For dashboard generation
library(dplyr)  #For data cleaning
library(leaflet)  #For data mapping
library(stringr)  #For dealing with string datatype
library(ggplot2)  #For data visualiztion
library(psych)  
library(kableExtra)   #For neat table


#Import dataset
zillow<-read.csv("Zip_Zhvi_2bedroom.csv",header=TRUE,na.strings = "")
airbnb<-read.csv("listings.csv",header=TRUE,na.strings = "")

Airbnb Dataset

#Dimension
dim(airbnb)

[1] 40753    95

#Overview on neighbourhood
airbnb %>% 
  group_by(neighbourhood_group_cleansed) %>% 
  summarise(unique_zipcodes = n_distinct(zipcode),
            number_properties = n())

# A tibble: 5 x 3
  neighbourhood_group_cleansed unique_zipcodes number_properties
  <fct>                                  <int>             <int>
1 Bronx                                     29               649
2 Brooklyn                                  46             16810
3 Manhattan                                 70             19212
4 Queens                                    69              3821
5 Staten Island                             12               261

Airbnb Dataset: 40753 rows and 95 columns.
Brooklyn and Manhattan have much more properties than Bronx, Quees, and Staten Island.

#Overview on Dataset
kable(head(airbnb[c("neighbourhood_group_cleansed","zipcode","calendar_updated","room_type","host_verifications","bedrooms","price","weekly_price","cleaning_fee","number_of_reviews")])) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))

neighbourhood_group_cleansed	zipcode	calendar_updated	room_type	host_verifications	bedrooms	price	weekly_price	cleaning_fee	number_of_reviews
Bronx	10464	yesterday	Private room	[‘email’, ‘phone’, ‘reviews’, ‘jumio’]	1	$99.00	NA	NA	25
Bronx	10464	6 months ago	Private room	[‘phone’, ‘facebook’]	1	$200.00	NA	NA	0
Bronx	10464	11 months ago	Entire home/apt	[‘email’, ‘phone’]	3	$300.00	NA	$100.00	0
Bronx	10464	2 weeks ago	Entire home/apt	[‘email’, ‘phone’, ‘reviews’, ‘kba’]	1	$125.00	$775.00	$75.00	12
Bronx	10464	yesterday	Private room	[‘email’, ‘phone’, ‘reviews’, ‘jumio’]	1	$69.00	$350.00	$17.00	86
Bronx	10464	a week ago	Entire home/apt	[‘email’, ‘phone’, ‘facebook’, ‘reviews’]	0	$125.00	$550.00	$35.00	41

Zillow Dataset

#Dimension
dim(zillow)

[1] 8946  262

#Overview on city and region
zillow %>% 
  summarise(unique_cities = n_distinct(City),
            unique_zipcodes = n_distinct(RegionName))

  unique_cities unique_zipcodes
1          4684            8946

Zillow Dataset: 8946 rows and 262 columns.
Zillow data cover 4684 cities while we only need New York City data.

#Dataset
kable(head(zillow[,1:15])) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))

RegionID	RegionName	City	State	Metro	CountyName	SizeRank	X1996.04	X1996.05	X1996.06	X1996.07	X1996.08	X1996.09	X1996.10	X1996.11
61639	10025	New York	NY	New York	New York	1	NA	NA	NA	NA	NA	NA	NA	NA
84654	60657	Chicago	IL	Chicago	Cook	2	167700	166400	166700	167200	166900	166900	168000	170100
61637	10023	New York	NY	New York	New York	3	NA	NA	NA	NA	NA	NA	NA	NA
84616	60614	Chicago	IL	Chicago	Cook	4	195800	193500	192600	192300	192600	193600	195500	197600
93144	79936	El Paso	TX	El Paso	El Paso	5	59100	60500	60900	60800	60300	60400	61200	61700
84640	60640	Chicago	IL	Chicago	Cook	6	123300	122600	122000	121500	120900	120600	120900	121300

Data Quality

Column

Airbnb Dataset

Missing Value

#Select important features of 2-bedrooms property in Airbnb dataset 
airbnb2<-airbnb[c("neighbourhood_group_cleansed","zipcode","country_code","latitude","longitude","is_location_exact","bedrooms","price","weekly_price","monthly_price","cleaning_fee","number_of_reviews")]%>%filter(bedrooms==2)
#with 4892 rows

#Check missing value for important features in Airbnb dataset
missing_airbnb<-as.data.frame(sapply(airbnb2, function(x) sum(is.na (x))/nrow(airbnb2)))
names(missing_airbnb)[1]<-"missing"
missing_airbnb

                                missing
neighbourhood_group_cleansed 0.00000000
zipcode                      0.01266857
country_code                 0.00000000
latitude                     0.00000000
longitude                    0.00000000
is_location_exact            0.00000000
bedrooms                     0.00000000
price                        0.00000000
weekly_price                 0.78933388
monthly_price                0.82611361
cleaning_fee                 0.19738455
number_of_reviews            0.00000000

#Weekly_price, monthly_price both have more than 70% missing values, so we do drop these two features
airbnb2<-airbnb2[,!colnames(airbnb2) %in% c("weekly_price","monthly_price")]

For price data we only take daily price of properties because there are too many missing values in weekly_price and monthly_price.

We find zipcode and clening_fee both have small part of missing values and we will deal with these missing value later.

Outlier

airbnb2 %>% mutate(price=as.numeric(price)) %>% 
                  ggplot(aes(x = neighbourhood_group_cleansed, y = price)) +
                  geom_boxplot() +
                  labs(title = "Neighbourhood vs Price", x = "Neighbourhood", 
                       y = "Price")

It is obvious there are outliers in Brooklyn price data, Queens price data and Manhattan price data. So we need to deal with outliers later.

Zillow Dataset

Missing Value

#Select important features in Zillow dataset
zillow2<- subset(zillow[,c(2,3,7,ncol(zillow))],zillow$City=="New York")

#Check missing value for important features in Zillow dataset
missing_zillow<-as.data.frame(sapply(zillow2, function(x) sum(is.na (x))/nrow(zillow2)))
names(missing_zillow)[1]<-"missing"
missing_zillow

           missing
RegionName       0
City             0
SizeRank         0
X2017.06         0

Here we only select the latest purchasing price because if it fit with the time of price data in airbnb dataset. The data we get from zillow is neat and the features we want have no missing values.

Data Cleaning

Column

Summary of data cleaning

(1) Airbnb Dataset:

zipcode:
zipcode should have 5 digits
Remove missing value

Price:
Remove missing value
Remove dollar sign in price

Cleaning fee:
Fill missing value with mean value Remove dollar sign in cleaning fee

others:
Filter “is_location_exact” which is true Remove outliers in every zipcode

(1) Zillow Dataset:

zipcode:
Zipcode should have 5 digits
Rename the column name from RegionName to zipcode
Rename the column name from X2017.06 to purchase_price
Filter the zipcode in New York

Price:
Make sure no missing value

Airbnb Dataset

airbnb_final<- airbnb2%>%
  mutate(price=as.numeric(price),cleaning_fee=as.numeric(gsub("\\$","",cleaning_fee)))%>%
  mutate(cleaning_fee=ifelse(is.na(cleaning_fee),mean(cleaning_fee[!is.na(cleaning_fee)]),cleaning_fee))%>%
  filter(is_location_exact=="t" & !is.na(zipcode) & (str_length(zipcode))==5)%>%
  filter(!price %in% boxplot.stats(price)$out)


kable(head(airbnb_final)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))

neighbourhood_group_cleansed	zipcode	country_code	latitude	longitude	is_location_exact	bedrooms	price	cleaning_fee	number_of_reviews
Bronx	10462	US	40.85753	-73.86605	t	2	65	85.82307	4
Bronx	10469	US	40.87054	-73.84681	t	2	87	75.00000	31
Queens	11102	US	40.77172	-73.91769	t	2	151	85.82307	0
Queens	11102	US	40.77937	-73.91553	t	2	205	85.82307	0
Queens	11105	US	40.78080	-73.91025	t	2	516	400.00000	1
Queens	11105	US	40.77862	-73.92149	t	2	178	95.00000	15

Zillow Dataset

zillow_final<- zillow2%>%
  rename(zipcode=RegionName,purchase_price=X2017.06)%>%
  filter(!is.na(zipcode) & (str_length(zipcode))==5)

kable(head(zillow_final)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))

zipcode	City	SizeRank	purchase_price
10025	New York	1	1431000
10023	New York	3	2142300
10128	New York	14	1787100
10011	New York	15	2480400
10003	New York	21	2147000
11201	New York	32	1420700

Merge Two Datasets

data<-merge(x=airbnb_final, y=zillow_final, by="zipcode")

kable(head(data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))

zipcode	neighbourhood_group_cleansed	country_code	latitude	longitude	is_location_exact	bedrooms	price	cleaning_fee	number_of_reviews	City	SizeRank	purchase_price
10003	Manhattan	US	40.73797	-73.98778	t	2	412	300.00000	1	New York	21	2147000
10003	Manhattan	US	40.72650	-73.98834	t	2	88	30.00000	12	New York	21	2147000
10003	Manhattan	US	40.73682	-73.98336	t	2	136	75.00000	1	New York	21	2147000
10003	Manhattan	US	40.72513	-73.98952	t	2	244	109.00000	86	New York	21	2147000
10003	Manhattan	US	40.72917	-73.98811	t	2	162	80.00000	134	New York	21	2147000
10003	Manhattan	US	40.73019	-73.98630	t	2	378	85.82307	0	New York	21	2147000

Data Analysis

Column

Distribution

From the map, we find most of properties are in Manhattan while other places are less popular. More propertities means more opportunities and more growth, while other places may not so competitive but they are much cheaper.

myPopup <- paste(sep='<br/>',
                 paste('<b>Zipcode</b>:',data$zipcode),
                 paste('<b>Neighborhood</b>:',data$neighbourhood_group_cleansed),
                 paste('<b>Rent price</b>:',data$price),
                 paste('<b>Purchase price</b>:',data$purchase_price),
                 paste('<b>Population</b>:',data$SizeRank))
leaflet() %>%addProviderTiles(providers$Esri.NatGeoWorldMap) %>%addCircleMarkers(data=data, radius=10, opacity=1, popup=myPopup,clusterOptions = markerClusterOptions())

Feature Engineering

Additional Assumption

1.The renting price and cleaning fee increases 3% per year in the period we investigate.
2.Expenses are not taken into account.
3.Assume that 60% occupany of property need cleaning.
4.Assume that the properties are available throughout a year.

Fomula

\[Expected Revenue= (DailyPrice+60%*CleaningFee) * Days* Occupancy Rate*(1-1.03^{Days/365})/(1-1.03)\]

\[ReturnRate= (ExpectedRevenue-TotalCost)/TotalCost\] \[BreakEven=log_{1.03}(1-(TotalCost(1-1.03)/ExpectedRevenue))\]

Create new features

data2<-data%>%
  group_by(zipcode) %>%
  summarise(mean_rent_price=mean(price),mean_reviews=mean(number_of_reviews),mean_cleaning_fee=mean(cleaning_fee),purchase_price=mean(purchase_price),neighbourhood=first(neighbourhood_group_cleansed),SizeRank=first(SizeRank))%>%
  mutate(occupancy_rate=0.75,
         revenue_y01=mean_rent_price*365*0.75+mean_cleaning_fee*365*0.75*0.6,
         return_rate_y01=(revenue_y01*1/(purchase_price)-1),
         return_rate_y05=(revenue_y01*(1-1.03^5)/(1-1.03))/purchase_price-1,
         return_rate_y10=(revenue_y01*(1-1.03^10)/(1-1.03))/purchase_price-1,
          return_rate_y20=(revenue_y01*(1-1.03^20)/(1-1.03))/purchase_price-1,
         return_rate_y30=(revenue_y01*(1-1.03^30)/(1-1.03))/purchase_price-1,
           break_even=(log(1-purchase_price*(1-1.03)/(revenue_y01),base=1.03)))

kable(data2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))

zipcode	mean_rent_price	mean_reviews	mean_cleaning_fee	purchase_price	neighbourhood	SizeRank	occupancy_rate	revenue_y01	return_rate_y01	return_rate_y05	return_rate_y10	return_rate_y20	return_rate_y30	break_even
10003	231.3796	26.583333	94.06417	2147000	Manhattan	21	0.75	78790.21	-0.9633022	-0.8051663	-0.5793007	-0.0139159	0.7459139	20.212421
10011	263.5057	12.206897	107.09186	2480400	Manhattan	15	0.75	89724.54	-0.9638266	-0.8079504	-0.5853123	-0.0280068	0.7209652	20.432301
10013	248.3286	18.700000	101.40835	3316500	Manhattan	1744	0.75	84636.27	-0.9744802	-0.8645122	-0.7074446	-0.3142746	0.2141130	26.296263
10014	244.3827	16.493827	95.23447	2491600	Manhattan	379	0.75	82542.03	-0.9668719	-0.8241183	-0.6202232	-0.1098349	0.5760842	21.813597
10021	147.8000	19.266667	73.88615	1815600	Manhattan	190	0.75	52596.05	-0.9710310	-0.8461998	-0.6679033	-0.2215931	0.3782105	24.046517
10022	243.9000	25.800000	101.29795	2031600	Manhattan	894	0.75	83405.81	-0.9589458	-0.7820374	-0.5293590	0.1031431	0.9531730	18.557901
10023	236.2037	13.666667	101.67094	2142300	Manhattan	3	0.75	81360.22	-0.9620220	-0.7983698	-0.5646251	0.0204824	0.8068179	19.695567
10025	205.6465	17.525253	92.38135	1431000	Manhattan	1	0.75	71469.36	-0.9500564	-0.7348424	-0.4274521	0.3420045	1.3760897	15.914944
10028	214.1379	5.137931	97.34270	2083900	Manhattan	109	0.75	74608.80	-0.9641975	-0.8099197	-0.5895646	-0.0379738	0.7033181	20.590831
10036	267.6071	25.297619	103.19038	1712900	Manhattan	580	0.75	90206.48	-0.9473370	-0.7204049	-0.3962776	0.4150749	1.5054648	15.252949
10128	208.0208	6.854167	81.83077	1787100	Manhattan	14	0.75	70386.41	-0.9606142	-0.7908953	-0.5484857	0.0583118	0.8737970	19.157631
10304	278.5000	1.000000	52.91153	328300	Staten Island	1958	0.75	84930.09	-0.7413034	0.3734554	1.9656666	5.9512746	11.3075984	3.711935
10305	90.0000	3.333333	60.13718	425100	Staten Island	2087	0.75	34515.03	-0.9188073	-0.5689367	-0.0692163	1.1816792	2.8627781	10.637754
10306	269.0000	71.500000	65.00000	352900	Staten Island	668	0.75	84315.00	-0.7610796	0.2684607	1.7389543	5.4198799	10.3667361	4.001684
10308	473.0000	32.000000	25.00000	409500	Staten Island	4149	0.75	133590.00	-0.6737729	0.7319840	2.7398282	7.7658445	14.5203902	2.976260
10312	167.0000	0.000000	85.82307	355000	Staten Island	764	0.75	59812.69	-0.8315136	-0.1054826	0.9315083	3.5272939	7.0158127	5.543713
11201	170.8254	18.206349	85.91404	1420700	Brooklyn	32	0.75	60874.83	-0.9571515	-0.7725116	-0.5087902	0.1513547	1.0385342	17.954432
11215	144.4769	14.253846	82.19266	1070800	Brooklyn	71	0.75	53050.70	-0.9504569	-0.7369692	-0.4320444	0.3312404	1.3570314	16.017441
11217	165.6790	23.135803	88.92877	1302300	Brooklyn	1555	0.75	59961.18	-0.9539575	-0.7555540	-0.4721740	0.2371799	1.1904923	16.973871
11231	166.1719	16.437500	89.21682	1202900	Brooklyn	1817	0.75	60143.41	-0.9500013	-0.7345502	-0.4268211	0.3434833	1.3787081	15.900966
11234	156.2500	19.500000	80.41153	476900	Brooklyn	52	0.75	55981.03	-0.8826147	-0.3767857	0.3456905	2.1541860	4.5846527	7.699521
11434	335.0000	16.000000	46.10769	382300	Queens	622	0.75	99279.44	-0.7403101	0.3787288	1.9770533	5.9779641	11.3548536	3.698486

Analysis and Conclusion

Summary:

Although Staten Island has highest return ratios and shortest breakeven periods, the number of potential customers is much less than that of Manhattan and Brooklyn.
Although Manhattan has highest cost in properties, and much higher break-even period than other places but there are still some place worth to invest. The price of real estate is more likely to increase and bigger populations density means potentially higher occupancy rate.

Recommendation:

I would recommend place with zipcode of 10025 in manhattan and zipcode of 11234 in Brooklyn. The reason is as follows:

Population: These two places both have very high population size rank compared to other zipcodes.

Break-even period: These two places are in top-10 in terms of average break-even period and they are no more than 20 years.

Return rate: In ten years, the zipcode of 11234 could have a positive return rate of 30% while that of 10025 will be increasing more quickly than most places in Manhattan.

Total cost: Both of these two places are the cheapest region on Mahattan and Brooklyn. So the company could better control the budget on this investment.

Reviews: The number of reviews of these two places are between the middle and high level, which means people are more willing to live here than in many other places in Manhattan and Brooklyn.

Future Work

In the future, we need to investigate:

How does the macroeconomic environment influence the return rate of real estate, for example, change in number of tourists?
Consider about the future change for the current investment, for example, if the company want to sell its properties at certain time in the future. In this case, we need to consider more about financial variables such as the increase in the value of real estate.
Consider about other expenses that the investment will bring, for example, tax, maintainance cost, marketing cost, etc.
Will buying more properties have an economic scale effect? This could bring the revenue to grow up.
What is the company’s market strategy? Does it want to target at high price market with smaller sales volume or target at low price market with bigger sales volume?

Column

Breakeven

breakeven<-data2%>%
  ggplot(aes(x = reorder(zipcode, break_even), y = break_even,fill=neighbourhood)) +
  geom_bar(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Average Breakeven Period vs Zipcodes",x = "Zipcode", y = "Breakeven Period")+
  theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

breakeven

Return_rate

revenue_year20<-
  ggplot(data=data2,aes(x = zipcode, y = return_rate_y10,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Return rate over 10 years vs Zipcodes",x = "Zipcode", y = "Return rate")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()
revenue_year20

Revenue

revenue_year<-
  ggplot(data=data2,aes(x = zipcode, y = revenue_y01,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Average Revenue/year vs Zipcodes",x = "Zipcode", y = "Average Revenue")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

revenue_year

Total_cost

Total_cost<-data2%>%
  ggplot(aes(x = zipcode, y = purchase_price,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Total_cost vs Zipcodes",x = "Zipcode", y = "Total_cost")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

Total_cost

Population

sizerank<-data2%>%
  ggplot(aes(x = zipcode, y = SizeRank,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Population Rank vs Zipcodes",x = "Zipcode", y = "Population Rank")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

sizerank

Review

number_review<- data2%>%
  ggplot(aes(x = zipcode, y = mean_reviews,fill=neighbourhood)) +
  geom_histogram(stat = "identity") +
  scale_fill_brewer(palette="Blues")+
  labs(title="Number of reviews vs Zipcodes",x = "Zipcode", y = "Number of Reviews")+
    theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
        legend.position = "bottom",
        legend.title = element_text(face = "bold"),
        panel.background = element_rect(fill = "white", colour = "white"),
        panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
        panel.grid.major.y = element_blank(),
        axis.line = element_line(size = 1, colour = "grey80"),
        axis.text = element_text(colour = "peru"),
        axis.ticks.length = unit(0.25, "cm"),
        axis.ticks = element_line(size = 0.2))+
  coord_flip()

number_review