1. Visualising AirBnB Occupancies and Revenue Generated Over Time

Calendar heatmaps are often used to discern patterns, trends, and anomalies over time in a calendar-like interface. In this case, they can be used to understand how AirBnB occupancies and revenue generated over change over the period of 2019 to 2020.

a. Total Occupancies in 2019 and 2020

We can see that total occupancies in 2019 were generally at least twice that of occupancies in 2020.

b. Occupancy Rates in 2019 and 2020

Occupancy rates are severely reduced in 2020.

c. Total Daily Revenue in 2019 and 2020

Total daily revenue decreased from about 6 million dollars a day to 2 or 3 million dollars a day.

d. Average Daily Revenue in 2019 and 2020

Average daily revenues are also lower in 2020.

2. Seasonal Changes in AirBnB Occupancy and Revenue Generated

We can view the same patterns using line graphs instead.

3. Data Tables on Occupancy and Revenue of AirBnBs in New York City

The data tables allow users to see the number of bookings, listings, percentage of listings booked, total revenue, and average revenue for any day that they select.

4. Understanding Trends in AirBnB Reviews

The patterns in the reviews are similar to those in the occupancy and revenue trends.

On average, we see that only 1.5% of guests leave reviews.

5. What Makes a Host a Superhost?

We were interested to know what makes a host a superhost. We believe that response and acceptance rates could be linked closely to it.

ggplot(data = superhost_summary, aes(x = host_is_superhost, y = avgResponse, fill = host_is_superhost)) +
  geom_col(width = 0.5) +
  labs(title = 'Average Response Rates of Superhosts and Non-Superhosts') +
  ylab('Response Rates (%)') +
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = 'none',
        axis.title.x = element_blank())

6. Let’s do some text analysis!!

library(tidyverse)

###/Users/armaanahmed/Desktop/listings.csv 
##/Users/armaanahmed/Desktop/reviews.csv 
##/Users/armaanahmed/Desktop/calendar.csv 
##install.packages("textdata")

reviews <- read.csv("/Users/armaanahmed/Desktop/Data\ Viz\ AirBNB\ Data/Su/data2/reviews.csv")
listings <- read.csv("/Users/armaanahmed/Desktop/Data\ Viz\ AirBNB\ Data/Su/data2/listings.csv")
airbnb <- inner_join(listings, reviews, by=c("id" = "listing_id"))

## filter 2019-2020 data
airbnb <- airbnb %>% filter(date > "2018-12-31" & date < "2021-01-01")

## How many properties does a host own?
airbnb2 <- airbnb %>% group_by(host_id) %>%
  count(id) %>%
  arrange(desc(n)) %>%
  group_by(host_id) %>%
  count() %>% arrange(desc(n)) 

table(airbnb2$n)

## 
##     1     2     3     4     5     6     7     8     9    10    11    12    13 
## 12132  1546   413   195    89    48    34    31    18     7     8     6     2 
##    14    15    16    17    18    20    21    22    23    24    26    29    30 
##     5     1     4     1     2     1     3     2     2     1     2     3     1 
##    31    32    34    35    36    37    40    78    91    98 
##     1     2     1     1     1     1     2     1     1     1

Cleaning and PreProcessing Text

library(tm)

## Loading required package: NLP

## 
## Attaching package: 'NLP'

## The following object is masked from 'package:ggplot2':
## 
##     annotate

library(quanteda)

## Package version: 3.0.0
## Unicode version: 10.0
## ICU version: 61.1

## Parallel computing: 6 of 6 threads used.

## See https://quanteda.io for tutorials and examples.

## 
## Attaching package: 'quanteda'

## The following object is masked from 'package:tm':
## 
##     stopwords

## The following objects are masked from 'package:NLP':
## 
##     meta, meta<-

## remove non-english comments
airbnb3 <- airbnb[which(!grepl("[^\x01-\x7F]+", airbnb$comments)),]


## remove stop words
airbnb3$comments <- removeWords(airbnb3$comments, stopwords(language = "en", source = "stopwords-iso"))
airbnb3$comments <- removeWords(airbnb3$comments, stopwords(language = "en", source = "marimo"))

## remove numbers, whitespace, punctuation
airbnb3$comments <- removeNumbers(airbnb3$comments)
airbnb3$comments <- stripWhitespace(airbnb3$comments)
airbnb3$comments <- removePunctuation(airbnb3$comments)

## tolower
airbnb3$comments <- tolower(airbnb3$comments)

Sentiment Analysis

##install.packages("tidytext")
library(tidytext)

tidy_ab  <- unnest_tokens(airbnb3,  output = word, input = comments) %>%
  anti_join(stop_words, by = "word")

afinn <- get_sentiments("afinn")
tidy_ab_sent <- inner_join(tidy_ab,  afinn, by = "word")

sent_by_rev <- tidy_ab_sent %>%
  group_by(host_id, reviewer_id) %>%
  mutate(rev_sent = mean(value))

summary(tidy_ab_sent$price)

##    Length     Class      Mode 
##    652281 character character

## 75% of the properties are cheaper than $145 per night

 
tidy_ab_sent <- tidy_ab_sent %>%
  group_by(host_id, reviewer_id) %>%
  mutate(rev_sent = mean(value))

tidy_ab_sent$sentiment_fac <- cut(tidy_ab_sent$rev_sent, breaks = -5:5)
table(tidy_ab_sent$sentiment_fac)

## 
## (-5,-4] (-4,-3] (-3,-2] (-2,-1]  (-1,0]   (0,1]   (1,2]   (2,3]   (3,4]   (4,5] 
##      18     638    2334    5194   13835   49593  230525  318297   31337     509

## <<DocumentTermMatrix (documents: 2, terms: 1528)>>
## Non-/sparse entries: 1528/1528
## Sparsity           : 50%
## Maximal term length: 17
## Weighting          : term frequency (tf)

Good Comments Word Cloud

What are the key words that are found in good comments?

Words like Clean, nice, recommend all come up! It seems like cleanliness, aesthetics, and social cues (like recommend) are the most important aspect of a good review.

Bad Comments Word Cloud

What are the key words that are found in bad comments?

Having words like noisy, bad, dirty, block, hard stops come up in bad reviews! people want to have a nice, quiet, clean place to stay!

Dissimilar words Word Cloud

7. How does the distribution of AirBNB locations look throughout NYC?

library(readxl)
library(ggplot2)
library(ggthemes)
library(dplyr)
library(maps)
library(tidyverse)
library(tmap)
library(ggmap)
library(hablar)
library(maps)
library(tidyverse)
library(ggmap)
library(rgdal)
library(data.table)
library(devtools)
library(leaflet)
library(geojsonio)
library(readr)
library(RgoogleMaps)



reviews <- read_csv("/Users/armaanahmed/Desktop/untitled\ folder\ 2/reviews.csv")
calendar <- read_csv("/Users/armaanahmed/Desktop/untitled\ folder\ 2/calendar.csv")
listings <- read_csv("/Users/armaanahmed/Desktop/untitled\ folder\ 2/listings.csv")

airbnb <- read_csv("/Users/armaanahmed/Documents/GitHub/Group_O_Airbnb/AB_US_2020.csv")
airbnb<-subset(airbnb, city == "New York City")


##Get rid of unnecessary data in park dataset


##Create base layer map
map_TS_st1 <- get_map("New York City", zoom=12, 
                      source="stamen",maptype="toner-background")
ggmap_TS_st1 <- ggmap(map_TS_st1) 
ggmap_TS_st1

map2<-ggmap_TS_st1 + geom_point(aes(x=longitude,y=latitude),data=airbnb, 
                    size=1, alpha=0.9, color="blue")
map2

airbnbdt <- as.data.table(airbnb)

##let's stagger the prices ranges
airbnb$pricerange[airbnb$price > 400] <- "Ultra-expensive?"
airbnb$pricerange[airbnb$price < 400] <- "Expensive"
airbnb$pricerange[airbnb$price < 300] <- "Kinda Pricey"
airbnb$pricerange[airbnb$price < 200] <- "A steal"
airbnb$pricerange[airbnb$price < 100] <- "$99 Bargain"

##do some color work/differentiate price by color
library(RColorBrewer)
pal = colorFactor("Set1", domain = airbnb$pricerange) # Grab a palette
color_offsel1 = pal(airbnb$pricerange)

##popup content
content <- paste("Check this AirBNB out!!", "<br/>",
                 "Price:",airbnb$price,"<br/>",
                 "Number of Reviews:",airbnb$number_of_reviews,"<br/>",
                 "Type of Room:",airbnb$room_type,"<br/>")

interactiveairbnbmap <- leaflet(airbnb, options = leafletOptions(minZoom = 12, maxZoom = 18)) %>%  # Create a map widget
  addTiles() %>%    
  addCircles(lat=~latitude, lng=~longitude,color = color_offsel1, popup = content) %>%
    addProviderTiles("NASAGIBS.ViirsEarthAtNight2012") %>%setView( lng = -73.96, lat = 40.78, zoom = 14 )
interactiveairbnbmap

clusteredmap <- leaflet(airbnb, options = leafletOptions()) %>%  # Create a map widget
  addTiles() %>%    # Add default OpenStreetMap map tiles
  addCircleMarkers(lat=~latitude, lng=~longitude,color = color_offsel1, popup = content,  clusterOptions = markerClusterOptions()) %>%
  setView( lng = -73.96, lat = 40.78, zoom = 14 ) %>% addLegend(pal = pal, values = airbnb$pricerange, title = "AirBNB's in New York City <br/> Check it out!") %>% addProviderTiles("NASAGIBS.ViirsEarthAtNight2012")
clusteredmap

##It looks like a majority of AirBNBs are densely concentrated in either Manhattan or Long Island City, which less and less more scattered out throughout the Bronx, Staten Island and Queens.

##Create a table that has the mean Fine by police precinct
averagefines <- airbnb %>%
    group_by(price) %>%
    dplyr::summarize(Mean = mean(price, na.rm=TRUE))

##innerjoin neighborhood from listings to airbnb

##Get rid of unnecessary data in listings dataset
smalllistings <- listings %>% select(neighbourhood_cleansed, id)

airbnb <- left_join(airbnb, 
                      smalllistings, 
                       by=c("id"))

##Save it as a dataframe
##as.data.frame.matrix(averagefines) 
averagefines <- data.frame(averagefines)





# 8. How do prices vary by borough/neighborhood for AirBNBs?

Neighborhood Price Borough

7 Battery Park City 210.70175

14 Bedford-Stuyvesant 109.65472

17 Belmont 49.11111

19 Bergen Beach 129.60000

23 Brighton Beach 113.58824

27 Bull’s Head 53.00000

41 Claremont Village 71.04545

48 Columbia St 182.80645

50 Concourse 80.51064

51 Concourse Village 60.33333

56 Ditmars Steinway 93.59487

64 East Harlem 130.61490

65 East Morrisania 86.58333

75 Fieldston 350.11111

78 Flatiron District 338.63636

81 Fordham 78.46667

88 Gramercy 182.97642

98 Highbridge 76.44444

102 Howland Hook 137.50000

104 Hunts Point 49.77778

114 Kingsbridge 88.90909

121 Longwood 97.60465

127 Melrose 54.28571

134 Morris Heights 120.52941

136 Morrisania 80.18182

137 Mott Haven 107.48837

138 Mount Eden 108.75000

139 Mount Hope 86.26316

141 Navy Yard 134.50000

144 New Dorp Beach 82.00000

147 Nolita 191.89362

148 North Riverdale 96.00000

149 Norwood 73.19048

157 Port Morris 74.38462

161 Prospect-Lefferts Gardens 101.56129

169 Riverdale 121.33333

171 Roosevelt Island 115.55814

177 Sheepshead Bay 137.28283

184 South Slope 150.15625

186 Spuyten Duyvil 91.33333

187 St. Albans 114.55357

188 St. George 91.45714

190 Stuyvesant Town 175.17308

195 Throgs Neck 79.04545

199 Tremont 78.71429

200 Tribeca 408.27083

203 University Heights 52.31250

211 West Farms 65.00000

219 Windsor Terrace 135.24051

221 Woodlawn 53.75000

‘data.frame’: 222 obs. of 3 variables:

$ Neighborhood: chr “Allerton” “Baychester” “Belmont” “Bronxdale” …

$ Price : num 109.7 85.5 49.1 55.9 72 …

$ Borough : chr “Bronx” “Bronx” “Bronx” “Bronx” …

```

For our chloropleth maps, they are attached seperately because the html files were too large. AirBNB in 2019 had more rooms and vacancies throughout NYC, while in 2020 we saw that the number of rooms and vacancies decreased due to COVID-19. plotly_num_airbnb19.html plotly_num_airbnb20.html also in our GITHUB!

Thanks for a great semester!!

Group O Final Project

Armaan, Lavanya, Su, KJ

4/19/2021