FINAL 110

Author

Dajana R

Source: https://wallup.net/cityscape-new-york-city-sunset/ ## Introduction and Dataset Information

In this project, I will explore the Airbnb listings dataset for New York City in 2019. The data set, as obtained in AIRBNB NYC It contains a total of 48895 observations with 16 variables, including both quantitative and categorical variables. The data set was collected by Airbnb , and the data collection method involved is not mentioned. I think that it was likely that this data was scrapped from Airbnb.

Variables

  • Price: The nightly rental price of the Airbnb listing.
  • Minimum Nights: The minimum number of nights required for booking.
  • Neighbourhood Group: The group of neighborhoods the listing is located in.
  • Room Type: The type of room available for rent (e.g., Entire home/apt, Private room, Shared room). This variables will be checked for NAs using the sum and is.na. They will be cleaned by ommiting the NAs.

##Purpose This data set offers insights into the pricing dynamics of Airbnb listings in New York City. By analyzing various factors such as neighborhood, room type, and minimum nights required, I would like to discover trends that affect pricing. I chose this topic because I would like to visit the city someday or even move to NYC.

Loading the Packages

### Loading ggplot2,dlpyr,leaflet,tidyr, and plotly.

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
library(tidyr)
Warning: package 'tidyr' was built under R version 4.3.3
library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout

Loading the data set

### Using radr to read the data set
 NYCAB <- readr::read_csv("AB_NYC_2019.csv")
Rows: 48895 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): name, host_name, neighbourhood_group, neighbourhood, room_type, la...
dbl (10): id, host_id, latitude, longitude, price, minimum_nights, number_of...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Looking at the head of the data

head(NYCAB)
# A tibble: 6 × 16
     id name        host_id host_name neighbourhood_group neighbourhood latitude
  <dbl> <chr>         <dbl> <chr>     <chr>               <chr>            <dbl>
1  2539 Clean & qu…    2787 John      Brooklyn            Kensington        40.6
2  2595 Skylit Mid…    2845 Jennifer  Manhattan           Midtown           40.8
3  3647 THE VILLAG…    4632 Elisabeth Manhattan           Harlem            40.8
4  3831 Cozy Entir…    4869 LisaRoxa… Brooklyn            Clinton Hill      40.7
5  5022 Entire Apt…    7192 Laura     Manhattan           East Harlem       40.8
6  5099 Large Cozy…    7322 Chris     Manhattan           Murray Hill       40.7
# ℹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <chr>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>

Looking at the structure of the data

str(NYCAB)
spc_tbl_ [48,895 × 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ id                            : num [1:48895] 2539 2595 3647 3831 5022 ...
 $ name                          : chr [1:48895] "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
 $ host_id                       : num [1:48895] 2787 2845 4632 4869 7192 ...
 $ host_name                     : chr [1:48895] "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
 $ neighbourhood_group           : chr [1:48895] "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
 $ neighbourhood                 : chr [1:48895] "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
 $ latitude                      : num [1:48895] 40.6 40.8 40.8 40.7 40.8 ...
 $ longitude                     : num [1:48895] -74 -74 -73.9 -74 -73.9 ...
 $ room_type                     : chr [1:48895] "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
 $ price                         : num [1:48895] 149 225 150 89 80 200 60 79 79 150 ...
 $ minimum_nights                : num [1:48895] 1 1 3 1 10 3 45 2 2 1 ...
 $ number_of_reviews             : num [1:48895] 9 45 0 270 9 74 49 430 118 160 ...
 $ last_review                   : chr [1:48895] "10/19/2018" "5/21/2019" NA "7/5/2019" ...
 $ reviews_per_month             : num [1:48895] 0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
 $ calculated_host_listings_count: num [1:48895] 6 2 1 1 1 1 1 1 1 4 ...
 $ availability_365              : num [1:48895] 365 355 365 194 0 129 0 220 0 188 ...
 - attr(*, "spec")=
  .. cols(
  ..   id = col_double(),
  ..   name = col_character(),
  ..   host_id = col_double(),
  ..   host_name = col_character(),
  ..   neighbourhood_group = col_character(),
  ..   neighbourhood = col_character(),
  ..   latitude = col_double(),
  ..   longitude = col_double(),
  ..   room_type = col_character(),
  ..   price = col_double(),
  ..   minimum_nights = col_double(),
  ..   number_of_reviews = col_double(),
  ..   last_review = col_character(),
  ..   reviews_per_month = col_double(),
  ..   calculated_host_listings_count = col_double(),
  ..   availability_365 = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Looking at the summary of the data

summary(NYCAB)
       id               name              host_id           host_name        
 Min.   :    2539   Length:48895       Min.   :     2438   Length:48895      
 1st Qu.: 9471945   Class :character   1st Qu.:  7822033   Class :character  
 Median :19677284   Mode  :character   Median : 30793816   Mode  :character  
 Mean   :19017143                      Mean   : 67620011                     
 3rd Qu.:29152178                      3rd Qu.:107434423                     
 Max.   :36487245                      Max.   :274321313                     
                                                                             
 neighbourhood_group neighbourhood         latitude       longitude     
 Length:48895        Length:48895       Min.   :40.50   Min.   :-74.24  
 Class :character    Class :character   1st Qu.:40.69   1st Qu.:-73.98  
 Mode  :character    Mode  :character   Median :40.72   Median :-73.96  
                                        Mean   :40.73   Mean   :-73.95  
                                        3rd Qu.:40.76   3rd Qu.:-73.94  
                                        Max.   :40.91   Max.   :-73.71  
                                                                        
  room_type             price         minimum_nights    number_of_reviews
 Length:48895       Min.   :    0.0   Min.   :   1.00   Min.   :  0.00   
 Class :character   1st Qu.:   69.0   1st Qu.:   1.00   1st Qu.:  1.00   
 Mode  :character   Median :  106.0   Median :   3.00   Median :  5.00   
                    Mean   :  152.7   Mean   :   7.03   Mean   : 23.27   
                    3rd Qu.:  175.0   3rd Qu.:   5.00   3rd Qu.: 24.00   
                    Max.   :10000.0   Max.   :1250.00   Max.   :629.00   
                                                                         
 last_review        reviews_per_month calculated_host_listings_count
 Length:48895       Min.   : 0.010    Min.   :  1.000               
 Class :character   1st Qu.: 0.190    1st Qu.:  1.000               
 Mode  :character   Median : 0.720    Median :  1.000               
                    Mean   : 1.373    Mean   :  7.144               
                    3rd Qu.: 2.020    3rd Qu.:  2.000               
                    Max.   :58.500    Max.   :327.000               
                    NA's   :10052                                   
 availability_365
 Min.   :  0.0   
 1st Qu.:  0.0   
 Median : 45.0   
 Mean   :112.8   
 3rd Qu.:227.0   
 Max.   :365.0   
                 

Checking for NAs in the data

### Using sum and is.na to check for missing NAs
sum(is.na(NYCAB))
[1] 20141

Omitting the NAs from the data

### Omitting the missing data
na.omit(NYCAB)
# A tibble: 38,821 × 16
      id name       host_id host_name neighbourhood_group neighbourhood latitude
   <dbl> <chr>        <dbl> <chr>     <chr>               <chr>            <dbl>
 1  2539 Clean & q…    2787 John      Brooklyn            Kensington        40.6
 2  2595 Skylit Mi…    2845 Jennifer  Manhattan           Midtown           40.8
 3  3831 Cozy Enti…    4869 LisaRoxa… Brooklyn            Clinton Hill      40.7
 4  5022 Entire Ap…    7192 Laura     Manhattan           East Harlem       40.8
 5  5099 Large Coz…    7322 Chris     Manhattan           Murray Hill       40.7
 6  5121 BlissArts…    7356 Garon     Brooklyn            Bedford-Stuy…     40.7
 7  5178 Large Fur…    8967 Shunichi  Manhattan           Hell's Kitch…     40.8
 8  5203 Cozy Clea…    7490 MaryEllen Manhattan           Upper West S…     40.8
 9  5238 Cute & Co…    7549 Ben       Manhattan           Chinatown         40.7
10  5295 Beautiful…    7702 Lena      Manhattan           Upper West S…     40.8
# ℹ 38,811 more rows
# ℹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <chr>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>

Performing Linear Regression Analysis

### Using lm to perform regression analysis 
### the predictor is minimun_nights
lm_model <- lm(price ~ minimum_nights, data = NYCAB)

Looking at the summary of the linear regression

summary(lm_model)

Call:
lm(formula = price ~ minimum_nights, data = NYCAB)

Residuals:
   Min     1Q Median     3Q    Max 
-595.6  -84.2  -46.7   24.8 9848.3 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    149.1978     1.1470 130.070   <2e-16 ***
minimum_nights   0.5011     0.0529   9.472   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 239.9 on 48893 degrees of freedom
Multiple R-squared:  0.001832,  Adjusted R-squared:  0.001811 
F-statistic: 89.73 on 1 and 48893 DF,  p-value: < 2.2e-16

Equation for the Model: price= 0.5011 X minimum_nights + 149.20 (y=mx+b)

P-values:

Both the intercept and coefficient for minimum_nights are statistically significant (p < 0.001), suggesting that the number of minimum nights significantly affects the price.

Adjusted R-squared:

The adjusted R-squared value is 0.001811, this shows that approximately 0.18% of the variance in the price can be explained by the number of minimum nights. This value is quite low, suggesting that the model may not be a good fit for the data.

Analysis:

The coefficient estimate for minimum nights suggests that, on average, for each additional minimum night required, the price increases by $0.50. However, the low adjusted R-squared value indicates that the model explains only a small amount of the variability in price, suggesting that other factors not included in the model may influence price more significantly. Overall, while the number of minimum nights has a statistically significant effect on price, the model is not a good fit at explaning it.

Exploring/ First Vizualisation

### Grouping by neighbourhood_group and taking the average price

NYCAB_avg_price <- NYCAB |>
  group_by(neighbourhood_group) |>
  summarise(avg_price = mean(price))

 ### Setting custom colors
custom_colors <- c("#FF6F61", "#6B5B95", "#88B04B", "#F7CAC9", "#9A46B2")

### Making a bar chart using plotly
### Adding tool tips

bar_chart <- plot_ly(data = NYCAB_avg_price, 
                     x = ~neighbourhood_group, 
                     y = ~avg_price, 
                     type = "bar",
                     text = ~paste("Neighbourhood Group: ", neighbourhood_group, "<br>Average Price: $", round(avg_price, 2)),
                     hoverinfo = "text",
                     marker = list(color = custom_colors)) |>
  layout(title = "Average Price by Neighbourhood Group",
         xaxis = list(title = "Neighbourhood Group"),
         yaxis = list(title = "Average Price"),
         hoverlabel = list(bgcolor = "white", font = list(family = "Arial", size = 12)), 
         margin = list(b = 80),  # Adjustting the source
         annotations = list(
           list(
             x = 0.5,
             y = -0.25,
             xref = "paper",
             yref = "paper",
             text = "Source: NYC Airbnb 2019",
             showarrow = FALSE,
             font = list(family = "Arial", size = 10)
           )))

bar_chart

The bar chart shows the average price of Airbnb listings across different neighbourhoods groups in NYC. Each bar represents a neighbourhood group, with custom colors used for them. The chart reveals significant variations in average prices neighborhoods among different neighbourhoods, providing valuable information on pricing dynamics for potential Airbnb hosts and travelers.

Exploring/ Second Vizualisation

### Filtering to leave the top prices

NYCAB_filtered <- NYCAB |>
  filter(price <= quantile(price, 0.99))  # Removing top 1% of prices
### Making a histogram using plotly
### Adding tool tips

histogram <- plot_ly(data = NYCAB_filtered, 
                     x = ~price, 
                     color = ~room_type,
                     type = "histogram",
                     text = ~paste("Room Type: ", room_type, "<br>Price: $", price),
                     hoverinfo = "text") |>
  layout(title = "Distribution of Prices by Room Type",
         xaxis = list(title = "Price"),
         yaxis = list(title = "Frequency"),
         hoverlabel = list(bgcolor = "white", font = list(family = "Arial", size = 12)),
         font = list(family = "Arial", size = 12),
         margin = list(b = 80),  # Adjusting the source
         annotations = list(
           list(
             x = 0.5,
             y = -0.25,
             xref = "paper",
             yref = "paper",
             text = "Source: NYC Airbnb 2019",
             showarrow = FALSE,
             font = list(family = "Arial", size = 10)
           )),
         barmode = "overlay",
         legend = list(title = "Room Type"))

histogram

The histogram shows the distribution of Airbnb listing prices categorized by room type. Different room types are categorized by colors, providing information on the pricing distribution for entire homes/apartments, private rooms, and shared rooms. This visualization allows the viewers to understand the range and frequency of prices across different room types in New York City’s Airbnb market.

Exploring/ Third Visualisation

### grouping by room type and neigbourhood group and taking the average price
avg_prices <- NYCAB_filtered |>
  group_by(room_type, neighbourhood_group) |>
  summarise(avg_price = mean(price))
`summarise()` has grouped output by 'room_type'. You can override using the
`.groups` argument.
### Making a heat map with plotly
### Adding tool tips
heatmap <- plot_ly() |>
  add_heatmap(data = avg_prices,
              x = ~neighbourhood_group,
              y = ~room_type, 
              z = ~avg_price,
              colorscale = "Viridis",
              text = ~paste("Room Type: ", room_type, "<br>Neighbourhood Group: ", neighbourhood_group, "<br>Average Price: $", round(avg_price, 2)),
              hoverinfo = "text",
              colorbar = list(title = "Average Price ($)"))|>
  layout(title = "Average Price by Room Type and Neighbourhood Group",
         xaxis = list(title = "Neighbourhood Group"),
         yaxis = list(title = "Room Type"),
         hoverlabel = list(bgcolor = "white", font = list(family = "Arial", size = 12)),
         font = list(family = "Arial", size = 12),
         annotations = list( ### Adding the source
           list(
             x = 0.5,
             y = -0.25,
             xref = "paper",
             yref = "paper",
             text = "Source: NYC Airbnb 2019",
             showarrow = FALSE,
             font = list(family = "Arial", size = 10)
           ))) 
heatmap

The heatmap visualizes the average price of Airbnb listings across different room types and neighborhood groups in NYC. The color intensity represents the average price, with darker shades indicating higher prices. This visualization allows viewers to identify trends and variations in pricing based on both room type and neighbourhood group, offering valuable information for potential Airbnb hosts and travelers.

Final Visualization

library(leaflet)
library(htmltools)

#### Converting room_type to factor
NYCAB$room_type <- factor(NYCAB$room_type)

### Making custom palette
room_colors <- c("pink", "black", "purple")

### Making pop up 
popup_content <- paste0(
  "<b>Price: </b>", "$",NYCAB$price, "<br>",
  "<b>Neighborhood: </b>", NYCAB$neighbourhood, "<br>",
  "<b>Minimum Nights: </b>", NYCAB$minimum_nights, "<br>",
  "<b>Neighborhood Group: </b>", NYCAB$neighbourhood_group, "<br>"
)

### Making leaflet map for NYC
### adding the locations for the airbnbs 
map <- leaflet() |>
  setView(lng = -73.935242, lat = 40.730610, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircleMarkers(
    data = NYCAB,
    lng = ~longitude,
    lat = ~latitude,
    radius = 2,
    color = ~colorFactor(palette = room_colors, levels = c("Entire home/apt", "Private room", "Shared room"))(room_type),
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "15px",
      direction = "auto"
    ), ### adding the pop up
    popup = popup_content
  ) |>
  addLegend(
    position = "bottomright",
    colors = room_colors,
    labels = c("Entire home/apt", "Private room", "Shared room"),
    title = "Room Type"
  )

map

Essay

A significant drop in New York City’s population during the COVID-19 pandemic led to record-breaking rent hikes in Manhattan and Brooklyn, according to The city. Between June 2020 and June 2022, roughly 400,000 people left the city, said The City. This is part of a trend opposite to what has occurred nationwide during the same time frame ᅳ when rents either stayed flat or fell. Why landlords were able to hike up prices amid a declining population was largely due to what one housing expert called “a perfect storm.” There was market churn, and changes in rent laws that emboldened property owners and prevented tenants from finding affordable places to live ᅳ while at the same time short-term rentals gobbled up units that could have been used for permanent housing. The Real Estate Board of New York (REBNY) also blamed a shortage of supply for exacerbating what it called a crisis. After the expiration of the 421-a tax abatement program, residential construction dropped off precipitously, according to REBNY. A spokesman for Mayor Bill de Blasio disputed this claim but did not provide any evidence or numbers refuting it when asked by The City.The visualization shows certain areas of NYC are more expensive than others despite smaller rooms being rented. The final visualization shows a map of NYC Airbnb listings which are colored by the type of room, and they also include the price of what it costs to rent it. This can tell us which NYC neighborhood it’s more affordable to stay at. However, it would be more helpful if this visualization provided greater context. For instance, what effect did particular policy changes or demographic shifts have on these numbers? Additionally including information about rental vacancy rates or housing construction trends might offer better insights into what’s driving New York City’s rental market dynamics. Also, including information from 2019 to 2022, when the pandemic was at its highest would have been better. In general, while capturing an upward trend of rent well enough, these graphic lacks depth because it does not explore variables around it which could have made it more explanatory and relevant to its environment. Also, A thing I wanted to but didn’t have enough time for was to group NYC city into its neighborhood and color by which type of room was more prevalent and how much the average price for each type of room was.

Reference: https://www.thecity.nyc/2023/08/04/why-is-nyc-rent-so-high/

** CHATGTP was used to fix errors.