Project Stages

PROBLEM STATEMENT

A real estate company that has a niche in purchasing properties to rent out short-term as part of their business model specifically within New York City. The real estate company has already concluded that two bedroom properties are the most profitable; however, they do not know which zip codes are the best to invest in.

The real estate company has engaged your firm to build out a data product and provide your conclusions to help them understand which zip codes would generate the most profit on short term rentals within New York City.

ASSUMPTIONS

Following were the assumptions for this project:-

  1. All the data from different data sources (Zillow and Aitbnb) is correct in terms of values, zipcodes and their corresponsing cities

  2. There are no seasonality variations- prices of properties on Zillow and price of Airbnb listings are not affected by seasonality.

  3. There is no wear and tear to properties- all properties are in good contions and will not require any significant maintainence or repair charges

  4. Tax to be paid for bying the property is constant and doesnt change by zipcode

  5. The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).

  6. The time value of money discount rate is 0% (i.e. $1 today is worth the same 100 years from now).

  7. All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale.)

  8. Occupancy rate of 75% throughout the year for Airbnb properties

PACKAGES REQUIRED

Following packages were used:

  • data.table: Importing the data

  • plotly : Used to plot interactive charts

  • tidyverse : Data manipulation and Plotting Graphs

  • leaflet : Creatioon of interactive maps

  • DT : Display the data on the screen in a scrollable form

  • ggthemes : For themes

#Installing package if it is not present
package_listing <- c('data.table', 'tidyverse','DT' , 'leaflet','plotly','ggthemes')

for (required_package in package_listing) {
  if (!require(required_package, character.only = T, quietly = T)) {
    install.packages(required_package, repos = "http://cran.us.r-project.org")
    library(required_package, character.only = T)
  }
}


#Loading the packages
library(data.table)
library(tidyverse)
library(DT)
library(plotly)
library(leaflet)
library(ggthemes)

DATA PREPARATION AND CLEANING

Data Description

There are 2 data sources from which data is provided

  • Zillow data- Cost for two-bedroom properties by zipcode.

  • AirBnB data- Detailed Information about property listing in New York including location, number of bedrooms, reviews, price, availability, property description.

The data was imported from secure sources provided by the company

Data Import

The data was imported from secure sources provided by the company

The Zillow dataset has 40753 observations and 95 colums. The Airbnb dataset has 8946 observations and 262 colums.

#Import the Data for Airbnb listings and Zillow

airbnb_listings <- fread("listings.csv")
## 
Read 32.2% of 62029 rows
Read 64.5% of 62029 rows
Read 40753 rows and 95 (of 95) columns from 0.144 GB file in 00:00:05
zillow <- fread("Zip_Zhvi_2bedroom.csv")



#Check the dimensions of the datasets 
dim(airbnb_listings)
## [1] 40753    95
dim(zillow)
## [1] 8946  262

Data Preparation

  • The zillow and airbnb datasets were merged together on the basis of zipcodes.We are finally left with data from 22 zipcodes in our final dataset which are from 4 counties.

  • Since the real estate company has already identified 2 bedroom properties, filter was applied on airbnb listings.

  • For our analysis, we keep the colums which are relevant to the problem statement and remove the rest of the columns.

From the Airbnb Dataset we remove majority of the columns containing information about the host because it is not relevant after purchasing new properties.

From the Zillow Dataset we remove attributes containing information about the cost of property from 1996 till the month before last 12 months. This is because the time value of money is 0. So there is no need for to incorporate net present value of a property.

We keep the following colums from Airbnb data

  • ‘id’
  • ‘street’
  • ‘neighbourhood’
  • ‘neighbourhood_cleansed’
  • ‘neighbourhood_group_cleansed’
  • ‘city’
  • ‘state’
  • ‘zipcode’
  • ‘market’
  • ‘latitude’
  • ‘longitude’
  • ‘price’
  • ‘weekly_price’
  • ‘number_of_reviews’
  • ‘review_scores_rating’,
  • ‘review_scores_location’
  • ‘review_scores_value’
  • ‘reviews_per_month’

We keep the following colums from Zillow data

  • ‘RegionID’
  • ‘RegionName’
  • ‘City’
  • ‘State’
  • ‘CountyName’
  • ‘SizeRank’
  • ‘2016-07’ to ‘2017-06’ (last 12 months)
#### Selecting last 12 months data
  zillow_last_12 <- zillow[,(ncol(zillow) - (12 - 1)):ncol(zillow)]
  zillow_new <- cbind(zillow[,1:7],zillow_last_12)

#### Joining airbnb and Zillow data based on zipcodes 
merged_data <- merge(airbnb_listings,zillow_new, by.x = "zipcode", by.y = "RegionName")

#### Selcting only 2 bedroom properties as per business requirement
merged_data <- merged_data %>% filter(bedrooms == 2)


####Dropping non-required columns as per data & business understanding
    drop_columns <- function(df, drop) {
    df <- df[, !names(df) %in% drop, drop = FALSE]
    df
    }
  
  #This is to remove the effects of a good orginial host on analysis
    non_required_cols <- c('listing_url',
                         'scrape_id',
                         'last_scraped',
                         'experiences_offered',
                         'notes',
                         'transit',
                         'access',
                         'interaction',
                         'house_rules',
                         'thumbnail_url',
                         'medium_url',
                         'picture_url',
                         'xl_picture_url',
                         'description',
                         'host_id',
                         'host_url',
                         'host_name',
                         'host_since',
                         'host_location',
                         'host_about',
                         'host_response_time',
                         'host_response_rate',
                         'host_acceptance_rate',
                         'host_is_superhost',
                         'host_thumbnail_url',
                         'host_picture_url',
                         'host_neighbourhood',
                         'host_listings_count',
                         'host_total_listings_count',
                         'host_verifications',
                         'host_has_profile_pic',
                         'host_identity_verified',
                         'neighbourhood',
                         'smart_location',
                         'country_code',
                         'country',
                         'is_location_exact',
                         'beds',
                         'bed_type',
                         'weekly_price',
                         'monthly_price',
                         'cleaning_fee',
                         'minimum_nights',
                         'maximum_nights',
                         'calendar_updated',
                         'has_availability',
                         'calendar_last_scraped',
                         'first_review',
                         'last_review',
                         'requires license',
                         'license',
                         'jurisdiction_names',
                         'instant_bookable',
                         'require_guest_profile_picture',
                         'require_guest_phone_verification',
                         'calculated_host_listings_count',
                         'name',
                         'summary',
                         'space',
                         'experiences_offered',
                         'transit',
                         'street',
                         'room_type',
                         'amenities',
                         'security_deposit',
                         'guests_included',
                         'extra_people',
                         'requires_license',
                         'cancellation_policy',
                         'City',
                         'State', 
                         'Metro', 
                         'CountyName')
    clean_data <- drop_columns(merged_data, non_required_cols)

Data Cleaning and Quality Check

For data quality check following steps were performed

  • Missing Data- Check missing values for each column and remove columns with 50% NA values
  • Incorrect Data-
  1. Check for negative numeric values such as price, cost etc.
  2. Check for duplicates records
  3. Removing all values where price is 0
  • Converting Data into proper format-The data regarding price is in textual format. We need to remove “$”&“,” symbols from price and convert it into a numerical variable.
#### Check missing values for each column and remove columns with 50% NA values

  sum(is.na(clean_data))  #total missing values
  apply(is.na(clean_data), 2, sum) #missing values for each column
  colnames(clean_data)[apply(is.na(clean_data), 2, any)] # identifying colnames containing missing values
  
  clean_data <- clean_data[,colSums(is.na(clean_data)) < (0.5*nrow(clean_data))] #No of nulls is less than 50%


####Check for negative numeric values
   neg_count <- sapply(clean_data, function(y) sum(length(which(y < 0))))
   nrow(neg_count[which(neg_count > 0)])  # total 0 variables have negative values
   

  
####Check for duplicates
   clean_data[duplicated(clean_data$id),] # duplicate rows = 0
   

#### Price Converstion to proper format and calculating mean value of a property
   
   
   #Converting price into proper format 
   clean_data$price <- as.numeric(gsub('[$,]','', clean_data$price))
   
   
  #Removing all values where price is 0
   clean_data <- clean_data %>%  subset(price != 0) %>% subset(price != 0) # removing all players' whose Valuation and Wage is 0.

Data Manupulation

  • For analyzing how much would a property cost on average, the cost of properties from the last 12 months were averaged

  • Since the price of an airbnb listing is nightly, this was converted to get price of a property for the whole year by assuming an occupancy rate of 75% throughout the year.

  • Breakeven point is the point at which total cost and total revenue are equal. There is no net loss or gain. This was calculated in years by diving mean cost by yearly price of a property.We will use this as an indicator of profitablity in our decision making

  • Most of the categorical attributes were in textual format. These were converted into factors accordingly.

  • In order to identify the most profitable households, it is essential to identify the yearly mean price and cost of a property by zipcode. Hence a new dataset was formed specifically for this purpose.

#Calculating total mean value of property 
    clean_data$mean_property_cost <- rowMeans(clean_data[,(ncol(clean_data)-(12-1)):ncol(clean_data)], na.rm = TRUE)
  

  
  #Calculating price for entire year and itdentifying and removing outliers
    occupancy_rate <- 0.75 #Assuming a default occupancy rate for an Airbnb property to 75%
    availability <- 365
    clean_data$yearly_airbnb_price <- clean_data$price * availability * occupancy_rate

# Breakeven_period 
    clean_data$breakeven_period <- clean_data$mean_property_cost / clean_data$yearly_airbnb_price
  
  #Renaming columns for zipcode and populationrank
    colnames(clean_data)[which(names(clean_data) == "RegionName")] <- "zipcode"
    colnames(clean_data)[which(names(clean_data) == "SizeRank")] <- "PopulationRank"
    
  #Converting necessary columns to factors
    clean_data$zipcode <- as.factor(clean_data$zipcode)
    clean_data$PopulationRank <- as.factor(clean_data$PopulationRank)
    clean_data$neighbourhood_group_cleansed <- as.factor(clean_data$neighbourhood_group_cleansed)
    clean_data$property_type <- as.factor(clean_data$property_type)
  
  
  # Creating a new dataset for calculating the mean cost, price and breakeven period for each zipcode
  breakeven_data <- clean_data %>%
                      select(zipcode,neighbourhood_group_cleansed,mean_property_cost,breakeven_period,yearly_airbnb_price) %>% 
                      group_by(zipcode,neighbourhood_group_cleansed) %>%
                      summarise(mean_breakeven = mean(breakeven_period),
                      mean_yearly_price = mean(yearly_airbnb_price),
                      mean_yearly_cost = mean(mean_property_cost))
  
  #Creating a 5,10,20 and 25 year profitabiltiy of a zipcode
  breakeven_data$n_5 <-  (breakeven_data$mean_yearly_price * 5 ) - breakeven_data$mean_yearly_cost
  breakeven_data$n_10 <- (breakeven_data$mean_yearly_price * 10 ) - breakeven_data$mean_yearly_cost
  breakeven_data$n_15 <- (breakeven_data$mean_yearly_price * 15 ) - breakeven_data$mean_yearly_cost
  breakeven_data$n_20 <- (breakeven_data$mean_yearly_price * 20 ) - breakeven_data$mean_yearly_cost
  breakeven_data$n_25 <- (breakeven_data$mean_yearly_price * 25 ) - breakeven_data$mean_yearly_cost

Data Summary

Our final dataset has 1238 observations and 45 columns.

#Final dataset
  final_data <- clean_data

#Dimensions of final data
dim(final_data)
## [1] 1238   45
#Structure of final data
summary_data <- data.table( attribute = names(final_data),
             class = sapply(final_data, class),
             missing_values = colSums(is.na(final_data)))

datatable(final_data)

EXPLORATORY DATA ANALYSIS

Now, that the data cleaning and munging processes are complete, the analysis stage would proceed. The step-by-step analysis is being segemented into 3 categories.

Georgraphical Analysis

Zipcodes across neighbourhoods

The EDA process starts with visualizing the zipcodes for all 4 neighbourhoods in NYC. From the below graph we can see that Manhattan neighbourhood has the largest number of zipcodes followed by Brookyn.

ggplot(final_data, aes(neighbourhood_group_cleansed)) + 
  geom_bar(aes(fill=zipcode), width = 0.5) + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
  labs(title="Zipcodes across neighbourhoods", 
       x = "Neighbourhood", 
       y ="Num,ber of Zipcodes")





Number of Properties across zipcodes

Next, we would visulaize the number of properties present in each of the zipcodes. This would help us understand how the data is spread for property listing and help us corroborate a logical conclusion from the graph above that since Manhattan and Brooklyn have higher number of zipcodes present in them, they would naturally have higher number of properties listed as compared to Queens and Staten Island. the below graph does exactly that. Zipcodes 10003 and 11215 have the highest number of properties listed inside them.

  final_data %>% select(zipcode,neighbourhood_group_cleansed) %>%
  group_by(neighbourhood_group_cleansed,zipcode) %>% count() %>%
  plot_ly(x =~n,y=~zipcode,color = ~neighbourhood_group_cleansed, type = 'bar',
          text = ~n,textposition = "auto") %>%
  layout(title = "Number of Properties by Zipcodes",
         yaxis = list(title = "Zipcode"),
         xaxis = list(title = "Number of Airbnb Properties")) 





Price per night in each neighbourhood

From the boxplot, we can observe that there is huge variation in price per night for properties in Manhattan and have quite a few outlier values. Queens and Staten Island have consistency in price per night for the properties listed.

  plot_ly(final_data, x = ~neighbourhood_group_cleansed,y = ~price,type = "box", jitter = 0.7,
  color = ~neighbourhood_group_cleansed) %>%
  layout(title = "Variation in Price per Night for a Airbnb property by Neighbourhood",
          yaxis = list(title = "Price per Night"),
          xaxis = list(title = "Neighbourhood")) 

Population Rank and Ratings of Airbnb Property

Correlation between Location Review of Airbnb Property and its Price

General knowledge tells us that tourist spots are scattered in NYC and hence living in a location that is close to many attractions would be a plus for tourists. A property that has good rating review as far as location is concerned would naturally charge a high price per night. This is visualized in the scatter plot below. We can see that as the value of location rating increases so does the price per night of that property.

  plot_ly(data = final_data, x = ~review_scores_location, y = ~price,
        marker = list(size = 10,
                       color = 'rgba(255, 182, 193, .9)',
                       line = list(color = 'rgba(152, 0, 0, .8)',
                                   width = 2))) %>%
  layout(title = 'Correlation between Location of Property and its Price',
         xaxis = list(zeroline = FALSE, title = 'Rating of Airbnb Property'),
         yaxis = list(zeroline = FALSE, title = 'Price')) 





Correlation between Review of Population Rank and and its Price

Population of an area in which the property is present is determined by the Population Rank attribute. The lower the number the greater the population. In order to see if there was any correlation between population of a neighbourhood and the nightly price of propety, the below visulaization was created. As can be seen there is no such correlation.

    plot_ly(data = final_data, x = ~PopulationRank, y = ~price,
        marker = list(size = 10,
                       color = 'rgba(155, 152, 193, .9)',
                       line = list(color = 'rgba(152, 0, 0, .8)',
                                   width = 2))) %>%
    layout(title = 'Correlation between Population Rank and Prices',
         caption = 'Lower the number the greater the population',
         xaxis = list(zeroline = FALSE, title = 'Population Rank'),
         yaxis = list(zeroline = FALSE, title = 'Price'))

Analysis of price and Breakeven period

Mean cost of Property by Zipcode and Neighbourhood

In order to better understand the cost structure of each property for the last 12 months, the average cost for a property was plotted for each neighbourhood. As can be seen from the below visualization, the cost of propeties in Manhattan neighbourhood are commanding higher as compared to their counterparts in Queens and Staten Island.

    final_data %>% plot_ly(x = ~ mean_property_cost,y = ~zipcode,color = ~neighbourhood_group_cleansed, type = 'scatter', size = ~mean_property_cost,
                     textposition = 'auto') %>%
    layout(title = "Mean cost of of Property by Zipcode and Neighbourhood",
         xaxis = list(title = "Mean cost of Airbnb Property"),
         yaxis = list(title = "Zipcode"))





Yearly Price of Airbnb Property by Zipcode and Neighbourhood

It is natural to conclude that neighbourhood that have costs higher would genreally have a higher price per night on Airbnb. To check this, the average price for a property was plotted for each neighbourhood. As we can see from below visualization, zipcodes 10036 and 1003 belonging to Manhattan neighbourhood have properties that have the highest asking price per night. On the other hand 10304, 10305, 10306, 10308 and 10312 belonging to Staten Island have low price per night on Airbnb.

    final_data %>% plot_ly(x = ~ yearly_airbnb_price,y = ~zipcode,color = ~neighbourhood_group_cleansed, type = 'scatter',
                     textposition = 'auto') %>%
    layout(title = "Yearly Price of Airbnb Property by Zipcode and Neighbourhood",
         xaxis = list(title = "Yearly price"),
         yaxis = list(title = "Zipcode"))





Comparison between mean price and mean cost of Airbnb properties across zipcodes

Since zipcode have many properties inside them with varying levels of cost and price per night, it is essential to visualize if the pattern for cost of properties and price per night of properties forllows the same pattern across zipcodes and neighbourhoods. The below visualization graphs the average cost and average price per night of zipcodes side by side. We can see that zipcodes 10312 and 10036 have a higher average price trend that the mean cost structure for their respective neighbourhoods.

cost_visualization <- breakeven_data %>% plot_ly(x = ~mean_yearly_cost,y = ~zipcode,color = ~neighbourhood_group_cleansed, type = 'bar',
                                                      text = ~round(mean_yearly_cost,1), textposition = 'auto') %>%
                          layout(title = "Comparison between mean price and mean cost of Airbnb properties across zipcodes",
                                 xaxis = list(title = "Mean cost per zipcode"),
                                 yaxis = list(title = "Zipcode"))
    
    price_visualization <- breakeven_data %>% plot_ly(x = ~mean_yearly_price,y = ~zipcode,color = ~neighbourhood_group_cleansed, type = 'bar',
                                                      text = ~round(mean_yearly_price,1), textposition = 'auto') %>%
                           layout(title = "Comparison between mean price and mean cost of properties across zipcodes",
                                    xaxis = list(title = "Mean price per zipcode"),
                                    yaxis = list(title = "Zipcode"))
    
    subplot(cost_visualization, price_visualization, shareY = TRUE, titleX = TRUE, margin = 0.1)





Breakeven Period Airbnb properties across zipcodes

The below graph identifies the breakeven period for different zipcodes across the 4 neighbourhoods.An interesting insight can be gathered from the below graph. Although the Manhattan neighbourhood commanded higher price per night of stay, its mean property cost was also very high. Hence this neighbourhood has the highest breakeven period.

Contrast this to the Staten Island neighbourhood zipcodes which have lower mean cost for properties on Zillow but have a fiarly high price per night value making it the most profitable neighbourhood.

Brooklyn zipcodes are doing a decent job in terms of breakeven period. Properties have fairly average cost in this zipcodes and the price per night on Airbnb for these properties is neither too high nor too low.

breakeven_data %>% plot_ly(x = ~order(mean_breakeven),y = ~zipcode,color = ~neighbourhood_group_cleansed, type = 'bar',
                                                      text = ~round(mean_breakeven,2), textposition = 'auto') %>%
                          layout(title = "Breakeven Period Airbnb properties across zipcodes",
                                 xaxis = list(title = "Mean breakeven period (in years)"),
                                 yaxis = list(title = "Zipcode"))





Interactive chart for a zipcode’s profitability on 5,10,15,20 and 25 year basis.

The final step in the analysis was to understand when will a zip code be profitable on a 5,10, 20 and 25 year plan basis. this would help the client better visualize their cash flows streams and return on investment. An interactive graph was plotted to visualize this. Number of years can be selected from the drop down menu.

      breakeven_data %>% plot_ly(y = ~n_5,x = ~zipcode,type = 'bar',textposition = 'auto') %>%
                          layout(title = "Breakeven Period Airbnb properties across zipcodes",
                                 caption = "Bars above 0 is profitable and Bars below 0 are in loss",
                                 yaxis = list(title = "Zipcode")) %>%
                         layout(
  title = "Interactive visualization for profitabiltiy of a zipcode",
  xaxis = list(domain = c(1, 1)),
  yaxis = list(title = "Profit/Loss"),
  updatemenus = list(
    list(
      y = 25,
      buttons = list(
        list(method = "restyle",
             args = list("y", list(breakeven_data$n_5)),  # put it in a list
             label = "5 years"),
        list(method = "restyle",
             args = list("y", list(breakeven_data$n_10)),  # put it in a list
             label = "10 years"),
        list(method = "restyle",
             args = list("y", list(breakeven_data$n_15)),  # put it in a list
             label = "15 years"),
        list(method = "restyle",
             args = list("y", list(breakeven_data$n_20)),  # put it in a list
             label = "20 years"),
        list(method = "restyle",
             args = list("y", list(breakeven_data$n_25)),  # put it in a list
             label = "25 years")))
))

CONCLUSION AND RECOMMENDATION

Conclusion

It can be concluded from the above analysis that

  • Manhattan and Brooklyn have higher number of zipcodes present in them, and have higher number of properties listed as compared to Queens and Staten Island.

  • There is huge variation in price per night for properties in Manhattan and have quite a few outlier values. Queens and Staten Island have consistency in price per night for the properties listed.

  • As the value of location rating a property resies in increases so does the price per night of that property.

  • There is no correlation between population of a neighbourhood and the nightly price of propety.

  • Zipcodes belonging to Manhattan neighbourhood have properties that have the highest asking price per night.

  • Although the Manhattan neighbourhood commanded higher price per night of stay, its mean property cost was also very high. Hence this neighbourhood has the highest breakeven period.

  • Staten Island neighbourhood zipcodes which have lower mean cost for properties on Zillow but have a fiarly high price per night value making it the most profitable neighbourhood.

  • Brooklyn zipcodes are doing a decent job in terms of breakeven period. Properties have fairly average cost in this zipcodes and the price per night on Airbnb for these properties is neither too high nor too low.



Recommendation

  • Avoid investing in Manhattan neighbourhood zipcodes- Although the Manhattan neighbourhood commanded higher price per night of stay, its mean property cost was also very high which leadsd to this neighbourhood having the highest breakeven period.

  • Invest in Staten neighbourhood zipcodes- Properties in the zipcodes belonging to Staten Island would reach breakeven the earliest and would be profitable that other zipcode properties. These zipcodes are
  1. 10304
  2. 10305
  3. 10306
  4. 10312
  • Zipcode 10312 would reach breakeven period earliest among all NYC zipcodes and the client should invest heavily in buying properties in this zipcode.

FUTURE STEPS

Additional steps that would be needed to perform to make a better decision or deliver a better product to the real estate company are:-

Business Decisions

  • Customer Segmentation- It is crucial before entiring into any new market segment to understand the choice patters of consumer behaviour. Hence segmenting the customer on basis of type of tourist (leisure, business etcs), financial spending power, age etc would be helpful.

  • Competitor Analysis- Before making any investment decision, analyzing the strategy of competitors is necessary. If competitors are agressive in their pricing or marketing strategies, then the real estate company would have to make additional investments for marketing or move towards long term plan.Companies like Roomorama, Homeaway, and the Expedia (EXPE)-owned Housetrip all have a dedicated following of their own, and are all beginning to eat into the market share.

  • Economic Risks- Tourism industry is generally booming if the economy is healthy. This needs to be factored in before making large investments in properties since the breakeven period is in years.

  • Financial health of Airbnb- Although the company appears to be health and has raised more than $3 billion in venture capital since its inception, taking on industry behemoth such as Wyndham and Marriot might lead to them taking an aggressive approach leading to Airbnb loosing its market share. Airbnb also is widely reagrded to have an IPO in 2018 and it might not perform as well as it should. Since the client would be using Airbnb’s platform, this would have adverse effect on the real estate company’s cashflows.

  • Regulatory Issues Airbnb has a lot of regulartory and legal hurdles. Last fall, the New York State Senate passed a bill that makes online apartment listings for stays shorter than 30 days illegal, which, not surprisingly, thwarts Airbnb in their goal to expand its market. This woudl definitely have effect on the clients’ revenue.



Incorporating data from additional sources

Next steps in this analysis would be to incorporate additional data about

  • Crime Data- Safety is one of the important factors for toursists and they generally prefer crime free neighbourhoods for renting properties. By incorporating the crime data, we would be able to factor this information.

  • Weather Data- Weather plays an important role for tourists to decide to visit a particular city during harsh weathers. Hence , it is important to factor in the seasonality of a particular city as any city would have peak and lows tourist periods accordingly.

  • Social Media Data- The real estate company can scrape through the social media platformsdata to identify the impressions a particular neighbourhood is having and based on popularity can make a better decision.

  • Traffic and Ease of Public Transport Data- If a zipcode is heavily congested with traffic or if it doesnt have proper public transport channels neareby, then there are high chances that toursits will avoid such zipcodes. So analysis of this data needs to be performed.