Goal:
A real estate company has a niche in purchasing properties with two-bedroom to rent out short-term as part of their business model specifically within New York City. They want to have a data product with a rational conclusion to help them find out what are the zip codes that will generate the most profit on short term rentals within New York City.
Datasets:
Cost data: Zillow provides us an estimation of value for two-bedroom properties
Revenue data: AirBnB is the medium through which the investor plans to lease out their investment property.
Basic Assumptions:
The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).
The time value of money discount rate is 0% (i.e. $1 today is worth the same 100 years from now).
All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale.)
Additional Assumption:
1.The renting price and cleaning fee increases 3% per year in the period we investigate.
2.Expenses are not taken into account.
3.Assume that 60% occupany of property need cleaning.
4.Assume that the properties are available throughout a year.
The whole process will be:
Data Overview: know the size of data, the basic variables and their relationship.
Data Quality Check: check the fraud data, missing value, inconsistent data format, outliers, etc.
Data Cleaning : Solve the problem found in data quality check and clean the data.
Data Analysis : Create new features helpful to analyze the problem, visualize data, make conclusion and recommendation and think about the future work to improve the current work.
#Load Packages
library(flexdashboard) #For dashboard generation
library(dplyr) #For data cleaning
library(leaflet) #For data mapping
library(stringr) #For dealing with string datatype
library(ggplot2) #For data visualiztion
library(psych)
library(kableExtra) #For neat table
#Import dataset
zillow<-read.csv("Zip_Zhvi_2bedroom.csv",header=TRUE,na.strings = "")
airbnb<-read.csv("listings.csv",header=TRUE,na.strings = "")#Dimension
dim(airbnb)[1] 40753 95
#Overview on neighbourhood
airbnb %>%
group_by(neighbourhood_group_cleansed) %>%
summarise(unique_zipcodes = n_distinct(zipcode),
number_properties = n())# A tibble: 5 x 3
neighbourhood_group_cleansed unique_zipcodes number_properties
<fct> <int> <int>
1 Bronx 29 649
2 Brooklyn 46 16810
3 Manhattan 70 19212
4 Queens 69 3821
5 Staten Island 12 261
Airbnb Dataset: 40753 rows and 95 columns.
Brooklyn and Manhattan have much more properties than Bronx, Quees, and Staten Island.
#Overview on Dataset
kable(head(airbnb[c("neighbourhood_group_cleansed","zipcode","calendar_updated","room_type","host_verifications","bedrooms","price","weekly_price","cleaning_fee","number_of_reviews")])) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))| neighbourhood_group_cleansed | zipcode | calendar_updated | room_type | host_verifications | bedrooms | price | weekly_price | cleaning_fee | number_of_reviews |
|---|---|---|---|---|---|---|---|---|---|
| Bronx | 10464 | yesterday | Private room | [‘email’, ‘phone’, ‘reviews’, ‘jumio’] | 1 | $99.00 | NA | NA | 25 |
| Bronx | 10464 | 6 months ago | Private room | [‘phone’, ‘facebook’] | 1 | $200.00 | NA | NA | 0 |
| Bronx | 10464 | 11 months ago | Entire home/apt | [‘email’, ‘phone’] | 3 | $300.00 | NA | $100.00 | 0 |
| Bronx | 10464 | 2 weeks ago | Entire home/apt | [‘email’, ‘phone’, ‘reviews’, ‘kba’] | 1 | $125.00 | $775.00 | $75.00 | 12 |
| Bronx | 10464 | yesterday | Private room | [‘email’, ‘phone’, ‘reviews’, ‘jumio’] | 1 | $69.00 | $350.00 | $17.00 | 86 |
| Bronx | 10464 | a week ago | Entire home/apt | [‘email’, ‘phone’, ‘facebook’, ‘reviews’] | 0 | $125.00 | $550.00 | $35.00 | 41 |
#Dimension
dim(zillow)[1] 8946 262
#Overview on city and region
zillow %>%
summarise(unique_cities = n_distinct(City),
unique_zipcodes = n_distinct(RegionName)) unique_cities unique_zipcodes
1 4684 8946
Zillow Dataset: 8946 rows and 262 columns.
Zillow data cover 4684 cities while we only need New York City data.
#Dataset
kable(head(zillow[,1:15])) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))| RegionID | RegionName | City | State | Metro | CountyName | SizeRank | X1996.04 | X1996.05 | X1996.06 | X1996.07 | X1996.08 | X1996.09 | X1996.10 | X1996.11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61639 | 10025 | New York | NY | New York | New York | 1 | NA | NA | NA | NA | NA | NA | NA | NA |
| 84654 | 60657 | Chicago | IL | Chicago | Cook | 2 | 167700 | 166400 | 166700 | 167200 | 166900 | 166900 | 168000 | 170100 |
| 61637 | 10023 | New York | NY | New York | New York | 3 | NA | NA | NA | NA | NA | NA | NA | NA |
| 84616 | 60614 | Chicago | IL | Chicago | Cook | 4 | 195800 | 193500 | 192600 | 192300 | 192600 | 193600 | 195500 | 197600 |
| 93144 | 79936 | El Paso | TX | El Paso | El Paso | 5 | 59100 | 60500 | 60900 | 60800 | 60300 | 60400 | 61200 | 61700 |
| 84640 | 60640 | Chicago | IL | Chicago | Cook | 6 | 123300 | 122600 | 122000 | 121500 | 120900 | 120600 | 120900 | 121300 |
#Select important features of 2-bedrooms property in Airbnb dataset
airbnb2<-airbnb[c("neighbourhood_group_cleansed","zipcode","country_code","latitude","longitude","is_location_exact","bedrooms","price","weekly_price","monthly_price","cleaning_fee","number_of_reviews")]%>%filter(bedrooms==2)
#with 4892 rows
#Check missing value for important features in Airbnb dataset
missing_airbnb<-as.data.frame(sapply(airbnb2, function(x) sum(is.na (x))/nrow(airbnb2)))
names(missing_airbnb)[1]<-"missing"
missing_airbnb missing
neighbourhood_group_cleansed 0.00000000
zipcode 0.01266857
country_code 0.00000000
latitude 0.00000000
longitude 0.00000000
is_location_exact 0.00000000
bedrooms 0.00000000
price 0.00000000
weekly_price 0.78933388
monthly_price 0.82611361
cleaning_fee 0.19738455
number_of_reviews 0.00000000
#Weekly_price, monthly_price both have more than 70% missing values, so we do drop these two features
airbnb2<-airbnb2[,!colnames(airbnb2) %in% c("weekly_price","monthly_price")]For price data we only take daily price of properties because there are too many missing values in weekly_price and monthly_price.
We find zipcode and clening_fee both have small part of missing values and we will deal with these missing value later.
airbnb2 %>% mutate(price=as.numeric(price)) %>%
ggplot(aes(x = neighbourhood_group_cleansed, y = price)) +
geom_boxplot() +
labs(title = "Neighbourhood vs Price", x = "Neighbourhood",
y = "Price") It is obvious there are outliers in Brooklyn price data, Queens price data and Manhattan price data. So we need to deal with outliers later.
#Select important features in Zillow dataset
zillow2<- subset(zillow[,c(2,3,7,ncol(zillow))],zillow$City=="New York")
#Check missing value for important features in Zillow dataset
missing_zillow<-as.data.frame(sapply(zillow2, function(x) sum(is.na (x))/nrow(zillow2)))
names(missing_zillow)[1]<-"missing"
missing_zillow missing
RegionName 0
City 0
SizeRank 0
X2017.06 0
Here we only select the latest purchasing price because if it fit with the time of price data in airbnb dataset. The data we get from zillow is neat and the features we want have no missing values.
zipcode:
zipcode should have 5 digits
Remove missing value
Price:
Remove missing value
Remove dollar sign in price
Cleaning fee:
Fill missing value with mean value Remove dollar sign in cleaning fee
others:
Filter “is_location_exact” which is true Remove outliers in every zipcode
zipcode:
Zipcode should have 5 digits
Rename the column name from RegionName to zipcode
Rename the column name from X2017.06 to purchase_price
Filter the zipcode in New York
Price:
Make sure no missing value
airbnb_final<- airbnb2%>%
mutate(price=as.numeric(price),cleaning_fee=as.numeric(gsub("\\$","",cleaning_fee)))%>%
mutate(cleaning_fee=ifelse(is.na(cleaning_fee),mean(cleaning_fee[!is.na(cleaning_fee)]),cleaning_fee))%>%
filter(is_location_exact=="t" & !is.na(zipcode) & (str_length(zipcode))==5)%>%
filter(!price %in% boxplot.stats(price)$out)
kable(head(airbnb_final)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))| neighbourhood_group_cleansed | zipcode | country_code | latitude | longitude | is_location_exact | bedrooms | price | cleaning_fee | number_of_reviews |
|---|---|---|---|---|---|---|---|---|---|
| Bronx | 10462 | US | 40.85753 | -73.86605 | t | 2 | 65 | 85.82307 | 4 |
| Bronx | 10469 | US | 40.87054 | -73.84681 | t | 2 | 87 | 75.00000 | 31 |
| Queens | 11102 | US | 40.77172 | -73.91769 | t | 2 | 151 | 85.82307 | 0 |
| Queens | 11102 | US | 40.77937 | -73.91553 | t | 2 | 205 | 85.82307 | 0 |
| Queens | 11105 | US | 40.78080 | -73.91025 | t | 2 | 516 | 400.00000 | 1 |
| Queens | 11105 | US | 40.77862 | -73.92149 | t | 2 | 178 | 95.00000 | 15 |
zillow_final<- zillow2%>%
rename(zipcode=RegionName,purchase_price=X2017.06)%>%
filter(!is.na(zipcode) & (str_length(zipcode))==5)
kable(head(zillow_final)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))| zipcode | City | SizeRank | purchase_price |
|---|---|---|---|
| 10025 | New York | 1 | 1431000 |
| 10023 | New York | 3 | 2142300 |
| 10128 | New York | 14 | 1787100 |
| 10011 | New York | 15 | 2480400 |
| 10003 | New York | 21 | 2147000 |
| 11201 | New York | 32 | 1420700 |
data<-merge(x=airbnb_final, y=zillow_final, by="zipcode")
kable(head(data)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))| zipcode | neighbourhood_group_cleansed | country_code | latitude | longitude | is_location_exact | bedrooms | price | cleaning_fee | number_of_reviews | City | SizeRank | purchase_price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10003 | Manhattan | US | 40.73797 | -73.98778 | t | 2 | 412 | 300.00000 | 1 | New York | 21 | 2147000 |
| 10003 | Manhattan | US | 40.72650 | -73.98834 | t | 2 | 88 | 30.00000 | 12 | New York | 21 | 2147000 |
| 10003 | Manhattan | US | 40.73682 | -73.98336 | t | 2 | 136 | 75.00000 | 1 | New York | 21 | 2147000 |
| 10003 | Manhattan | US | 40.72513 | -73.98952 | t | 2 | 244 | 109.00000 | 86 | New York | 21 | 2147000 |
| 10003 | Manhattan | US | 40.72917 | -73.98811 | t | 2 | 162 | 80.00000 | 134 | New York | 21 | 2147000 |
| 10003 | Manhattan | US | 40.73019 | -73.98630 | t | 2 | 378 | 85.82307 | 0 | New York | 21 | 2147000 |
From the map, we find most of properties are in Manhattan while other places are less popular. More propertities means more opportunities and more growth, while other places may not so competitive but they are much cheaper.
myPopup <- paste(sep='<br/>',
paste('<b>Zipcode</b>:',data$zipcode),
paste('<b>Neighborhood</b>:',data$neighbourhood_group_cleansed),
paste('<b>Rent price</b>:',data$price),
paste('<b>Purchase price</b>:',data$purchase_price),
paste('<b>Population</b>:',data$SizeRank))
leaflet() %>%addProviderTiles(providers$Esri.NatGeoWorldMap) %>%addCircleMarkers(data=data, radius=10, opacity=1, popup=myPopup,clusterOptions = markerClusterOptions())1.The renting price and cleaning fee increases 3% per year in the period we investigate.
2.Expenses are not taken into account.
3.Assume that 60% occupany of property need cleaning.
4.Assume that the properties are available throughout a year.
\[Expected Revenue= (DailyPrice+60%*CleaningFee) * Days* Occupancy Rate*(1-1.03^{Days/365})/(1-1.03)\]
\[ReturnRate= (ExpectedRevenue-TotalCost)/TotalCost\] \[BreakEven=log_{1.03}(1-(TotalCost(1-1.03)/ExpectedRevenue))\]
data2<-data%>%
group_by(zipcode) %>%
summarise(mean_rent_price=mean(price),mean_reviews=mean(number_of_reviews),mean_cleaning_fee=mean(cleaning_fee),purchase_price=mean(purchase_price),neighbourhood=first(neighbourhood_group_cleansed),SizeRank=first(SizeRank))%>%
mutate(occupancy_rate=0.75,
revenue_y01=mean_rent_price*365*0.75+mean_cleaning_fee*365*0.75*0.6,
return_rate_y01=(revenue_y01*1/(purchase_price)-1),
return_rate_y05=(revenue_y01*(1-1.03^5)/(1-1.03))/purchase_price-1,
return_rate_y10=(revenue_y01*(1-1.03^10)/(1-1.03))/purchase_price-1,
return_rate_y20=(revenue_y01*(1-1.03^20)/(1-1.03))/purchase_price-1,
return_rate_y30=(revenue_y01*(1-1.03^30)/(1-1.03))/purchase_price-1,
break_even=(log(1-purchase_price*(1-1.03)/(revenue_y01),base=1.03)))
kable(data2) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed","responsive"))| zipcode | mean_rent_price | mean_reviews | mean_cleaning_fee | purchase_price | neighbourhood | SizeRank | occupancy_rate | revenue_y01 | return_rate_y01 | return_rate_y05 | return_rate_y10 | return_rate_y20 | return_rate_y30 | break_even |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10003 | 231.3796 | 26.583333 | 94.06417 | 2147000 | Manhattan | 21 | 0.75 | 78790.21 | -0.9633022 | -0.8051663 | -0.5793007 | -0.0139159 | 0.7459139 | 20.212421 |
| 10011 | 263.5057 | 12.206897 | 107.09186 | 2480400 | Manhattan | 15 | 0.75 | 89724.54 | -0.9638266 | -0.8079504 | -0.5853123 | -0.0280068 | 0.7209652 | 20.432301 |
| 10013 | 248.3286 | 18.700000 | 101.40835 | 3316500 | Manhattan | 1744 | 0.75 | 84636.27 | -0.9744802 | -0.8645122 | -0.7074446 | -0.3142746 | 0.2141130 | 26.296263 |
| 10014 | 244.3827 | 16.493827 | 95.23447 | 2491600 | Manhattan | 379 | 0.75 | 82542.03 | -0.9668719 | -0.8241183 | -0.6202232 | -0.1098349 | 0.5760842 | 21.813597 |
| 10021 | 147.8000 | 19.266667 | 73.88615 | 1815600 | Manhattan | 190 | 0.75 | 52596.05 | -0.9710310 | -0.8461998 | -0.6679033 | -0.2215931 | 0.3782105 | 24.046517 |
| 10022 | 243.9000 | 25.800000 | 101.29795 | 2031600 | Manhattan | 894 | 0.75 | 83405.81 | -0.9589458 | -0.7820374 | -0.5293590 | 0.1031431 | 0.9531730 | 18.557901 |
| 10023 | 236.2037 | 13.666667 | 101.67094 | 2142300 | Manhattan | 3 | 0.75 | 81360.22 | -0.9620220 | -0.7983698 | -0.5646251 | 0.0204824 | 0.8068179 | 19.695567 |
| 10025 | 205.6465 | 17.525253 | 92.38135 | 1431000 | Manhattan | 1 | 0.75 | 71469.36 | -0.9500564 | -0.7348424 | -0.4274521 | 0.3420045 | 1.3760897 | 15.914944 |
| 10028 | 214.1379 | 5.137931 | 97.34270 | 2083900 | Manhattan | 109 | 0.75 | 74608.80 | -0.9641975 | -0.8099197 | -0.5895646 | -0.0379738 | 0.7033181 | 20.590831 |
| 10036 | 267.6071 | 25.297619 | 103.19038 | 1712900 | Manhattan | 580 | 0.75 | 90206.48 | -0.9473370 | -0.7204049 | -0.3962776 | 0.4150749 | 1.5054648 | 15.252949 |
| 10128 | 208.0208 | 6.854167 | 81.83077 | 1787100 | Manhattan | 14 | 0.75 | 70386.41 | -0.9606142 | -0.7908953 | -0.5484857 | 0.0583118 | 0.8737970 | 19.157631 |
| 10304 | 278.5000 | 1.000000 | 52.91153 | 328300 | Staten Island | 1958 | 0.75 | 84930.09 | -0.7413034 | 0.3734554 | 1.9656666 | 5.9512746 | 11.3075984 | 3.711935 |
| 10305 | 90.0000 | 3.333333 | 60.13718 | 425100 | Staten Island | 2087 | 0.75 | 34515.03 | -0.9188073 | -0.5689367 | -0.0692163 | 1.1816792 | 2.8627781 | 10.637754 |
| 10306 | 269.0000 | 71.500000 | 65.00000 | 352900 | Staten Island | 668 | 0.75 | 84315.00 | -0.7610796 | 0.2684607 | 1.7389543 | 5.4198799 | 10.3667361 | 4.001684 |
| 10308 | 473.0000 | 32.000000 | 25.00000 | 409500 | Staten Island | 4149 | 0.75 | 133590.00 | -0.6737729 | 0.7319840 | 2.7398282 | 7.7658445 | 14.5203902 | 2.976260 |
| 10312 | 167.0000 | 0.000000 | 85.82307 | 355000 | Staten Island | 764 | 0.75 | 59812.69 | -0.8315136 | -0.1054826 | 0.9315083 | 3.5272939 | 7.0158127 | 5.543713 |
| 11201 | 170.8254 | 18.206349 | 85.91404 | 1420700 | Brooklyn | 32 | 0.75 | 60874.83 | -0.9571515 | -0.7725116 | -0.5087902 | 0.1513547 | 1.0385342 | 17.954432 |
| 11215 | 144.4769 | 14.253846 | 82.19266 | 1070800 | Brooklyn | 71 | 0.75 | 53050.70 | -0.9504569 | -0.7369692 | -0.4320444 | 0.3312404 | 1.3570314 | 16.017441 |
| 11217 | 165.6790 | 23.135803 | 88.92877 | 1302300 | Brooklyn | 1555 | 0.75 | 59961.18 | -0.9539575 | -0.7555540 | -0.4721740 | 0.2371799 | 1.1904923 | 16.973871 |
| 11231 | 166.1719 | 16.437500 | 89.21682 | 1202900 | Brooklyn | 1817 | 0.75 | 60143.41 | -0.9500013 | -0.7345502 | -0.4268211 | 0.3434833 | 1.3787081 | 15.900966 |
| 11234 | 156.2500 | 19.500000 | 80.41153 | 476900 | Brooklyn | 52 | 0.75 | 55981.03 | -0.8826147 | -0.3767857 | 0.3456905 | 2.1541860 | 4.5846527 | 7.699521 |
| 11434 | 335.0000 | 16.000000 | 46.10769 | 382300 | Queens | 622 | 0.75 | 99279.44 | -0.7403101 | 0.3787288 | 1.9770533 | 5.9779641 | 11.3548536 | 3.698486 |
Summary:
Although Staten Island has highest return ratios and shortest breakeven periods, the number of potential customers is much less than that of Manhattan and Brooklyn.
Although Manhattan has highest cost in properties, and much higher break-even period than other places but there are still some place worth to invest. The price of real estate is more likely to increase and bigger populations density means potentially higher occupancy rate.
Recommendation:
I would recommend place with zipcode of 10025 in manhattan and zipcode of 11234 in Brooklyn. The reason is as follows:
Population: These two places both have very high population size rank compared to other zipcodes.
Break-even period: These two places are in top-10 in terms of average break-even period and they are no more than 20 years.
Return rate: In ten years, the zipcode of 11234 could have a positive return rate of 30% while that of 10025 will be increasing more quickly than most places in Manhattan.
Total cost: Both of these two places are the cheapest region on Mahattan and Brooklyn. So the company could better control the budget on this investment.
Reviews: The number of reviews of these two places are between the middle and high level, which means people are more willing to live here than in many other places in Manhattan and Brooklyn.
In the future, we need to investigate:
How does the macroeconomic environment influence the return rate of real estate, for example, change in number of tourists?
Consider about the future change for the current investment, for example, if the company want to sell its properties at certain time in the future. In this case, we need to consider more about financial variables such as the increase in the value of real estate.
Consider about other expenses that the investment will bring, for example, tax, maintainance cost, marketing cost, etc.
Will buying more properties have an economic scale effect? This could bring the revenue to grow up.
What is the company’s market strategy? Does it want to target at high price market with smaller sales volume or target at low price market with bigger sales volume?
breakeven<-data2%>%
ggplot(aes(x = reorder(zipcode, break_even), y = break_even,fill=neighbourhood)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette="Blues")+
labs(title="Average Breakeven Period vs Zipcodes",x = "Zipcode", y = "Breakeven Period")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
panel.grid.major.y = element_blank(),
axis.line = element_line(size = 1, colour = "grey80"),
axis.text = element_text(colour = "peru"),
axis.ticks.length = unit(0.25, "cm"),
axis.ticks = element_line(size = 0.2))+
coord_flip()
breakevenrevenue_year20<-
ggplot(data=data2,aes(x = zipcode, y = return_rate_y10,fill=neighbourhood)) +
geom_histogram(stat = "identity") +
scale_fill_brewer(palette="Blues")+
labs(title="Return rate over 10 years vs Zipcodes",x = "Zipcode", y = "Return rate")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
panel.grid.major.y = element_blank(),
axis.line = element_line(size = 1, colour = "grey80"),
axis.text = element_text(colour = "peru"),
axis.ticks.length = unit(0.25, "cm"),
axis.ticks = element_line(size = 0.2))+
coord_flip()
revenue_year20revenue_year<-
ggplot(data=data2,aes(x = zipcode, y = revenue_y01,fill=neighbourhood)) +
geom_histogram(stat = "identity") +
scale_fill_brewer(palette="Blues")+
labs(title="Average Revenue/year vs Zipcodes",x = "Zipcode", y = "Average Revenue")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
panel.grid.major.y = element_blank(),
axis.line = element_line(size = 1, colour = "grey80"),
axis.text = element_text(colour = "peru"),
axis.ticks.length = unit(0.25, "cm"),
axis.ticks = element_line(size = 0.2))+
coord_flip()
revenue_yearTotal_cost<-data2%>%
ggplot(aes(x = zipcode, y = purchase_price,fill=neighbourhood)) +
geom_histogram(stat = "identity") +
scale_fill_brewer(palette="Blues")+
labs(title="Total_cost vs Zipcodes",x = "Zipcode", y = "Total_cost")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
panel.grid.major.y = element_blank(),
axis.line = element_line(size = 1, colour = "grey80"),
axis.text = element_text(colour = "peru"),
axis.ticks.length = unit(0.25, "cm"),
axis.ticks = element_line(size = 0.2))+
coord_flip()
Total_costsizerank<-data2%>%
ggplot(aes(x = zipcode, y = SizeRank,fill=neighbourhood)) +
geom_histogram(stat = "identity") +
scale_fill_brewer(palette="Blues")+
labs(title="Population Rank vs Zipcodes",x = "Zipcode", y = "Population Rank")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
panel.grid.major.y = element_blank(),
axis.line = element_line(size = 1, colour = "grey80"),
axis.text = element_text(colour = "peru"),
axis.ticks.length = unit(0.25, "cm"),
axis.ticks = element_line(size = 0.2))+
coord_flip()
sizeranknumber_review<- data2%>%
ggplot(aes(x = zipcode, y = mean_reviews,fill=neighbourhood)) +
geom_histogram(stat = "identity") +
scale_fill_brewer(palette="Blues")+
labs(title="Number of reviews vs Zipcodes",x = "Zipcode", y = "Number of Reviews")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid.major = element_line(colour = "grey70",linetype = "dashed"),
panel.grid.major.y = element_blank(),
axis.line = element_line(size = 1, colour = "grey80"),
axis.text = element_text(colour = "peru"),
axis.ticks.length = unit(0.25, "cm"),
axis.ticks = element_line(size = 0.2))+
coord_flip()
number_review