Dataset Description

The NYC Airbnb Open Data for 2019 is the dataset that was used for this study; it contains information on various Airbnb accommodation located throughout New York City. The dataset enables a thorough examination of the Airbnb market since it includes key features like price, number of reviews, minimum nights, and location.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## āœ” dplyr     1.1.4     āœ” readr     2.1.5
## āœ” forcats   1.0.0     āœ” stringr   1.5.1
## āœ” ggplot2   3.5.1     āœ” tibble    3.2.1
## āœ” lubridate 1.9.3     āœ” tidyr     1.3.1
## āœ” purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## āœ– dplyr::filter() masks stats::filter()
## āœ– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read_delim("./AB_NYC_2019.csv", delim = ",")
## Rows: 48895 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): name, host_name, neighbourhood_group, neighbourhood, room_type
## dbl  (10): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## date  (1): last_review
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 Ɨ 16
##      id name        host_id host_name neighbourhood_group neighbourhood latitude
##   <dbl> <chr>         <dbl> <chr>     <chr>               <chr>            <dbl>
## 1  2539 Clean & qu…    2787 John      Brooklyn            Kensington        40.6
## 2  2595 Skylit Mid…    2845 Jennifer  Manhattan           Midtown           40.8
## 3  3647 THE VILLAG…    4632 Elisabeth Manhattan           Harlem            40.8
## 4  3831 Cozy Entir…    4869 LisaRoxa… Brooklyn            Clinton Hill      40.7
## 5  5022 Entire Apt…    7192 Laura     Manhattan           East Harlem       40.8
## 6  5099 Large Cozy…    7322 Chris     Manhattan           Murray Hill       40.7
## # ℹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## #   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
## #   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## #   availability_365 <dbl>


Project Goal

The primary goal of this project is to examine the pricing dynamics of NYC Airbnb listings, with particular focus on how minimum nights and guest reviews affect the price. Furthermore, the project aims to identify insights that may benefit hosts and potential guests alike, as well as evaluate perceived value using metrics such as price per review.

Visualizations for Further Investigation

Price Distribution by Neighborhood

Room Type Distribution:

Plan Moving Forward

To Do List:

Initial Findings

Hypotheses

Hypothesis 1: Listings in popularĀ tourist neighborhoods (e.g., Manhattan) have higher average prices than listings in less popular regions (e.g., Staten Island).

  • Visualization: This idea can be illustrated with a box plot comparing the average price in different neighborhoods.
ggplot(data, aes(x = neighbourhood_group, y = price)) +
  geom_boxplot() +
  labs(title = "Price Distribution by Neighborhood Group",
       x = "Neighborhood Group",
       y = "Price") +
  theme_minimal()

Hypothesis 2: Across all boroughs, entire home/apartment listingsĀ price more than private and shared rooms.

  • This hypothesis argues that guests are willing to pay more for private accommodation compared to shared living spaces.

  • Visualization: A violin plot can show the distribution of prices across room types

    ggplot(data, aes(x = room_type, y = price)) + 
      geom_violin(fill = "lightgreen") + 
      theme_minimal() + 
      labs(title = "Price Distribution by Room Type", x = "Room Type", y = "Price")