Final Project

Author

Paul Daniel-Orie

Load necessary libraries for continous presentation of Final Project

library(tidyverse)
Warning: package 'readr' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Warning: package 'plotly' was built under R version 4.4.3

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(RColorBrewer)
library(viridis)
Loading required package: viridisLite
library(GGally)
Warning: package 'GGally' was built under R version 4.4.3
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
library(leaflet)
Warning: package 'leaflet' was built under R version 4.4.3

Load dataset from working directory to the global environment,and make headers lower_case and remove space

setwd("C:/Users/Owner/OneDrive/Desktop/Data110")
# Suppress all messages when reading the CSV file
airbnb_2025 <- suppressMessages(read_csv("airbnb_washington_dc,2025.csv", show_col_types = FALSE))
names(airbnb_2025)<- gsub( " ","_",tolower(names(airbnb_2025)))
head(airbnb_2025)
# A tibble: 6 × 18
     id name        host_id host_name neighbourhood_group neighbourhood latitude
  <dbl> <chr>         <dbl> <chr>     <lgl>               <chr>            <dbl>
1  3686 Vita's Hid…    4645 Vita      NA                  Historic Ana…     38.9
2  3943 Historic R…    5059 Vasa      NA                  Edgewood, Bl…     38.9
3  4197 Capitol Hi…    5061 Sandra    NA                  Capitol Hill…     38.9
4  4529 Bertina's …    5803 Bertina   NA                  Eastland Gar…     38.9
5  5589 Cozy apt i…    6527 Ami       NA                  Kalorama Hei…     38.9
6  7103 Lovely gue…   17633 Charlotte NA                  Spring Valle…     38.9
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>

remove unwanted variables, while the removed variables are very important to Airbnb,however, for the purpose of my statistical analysis I will exclude them.

airbnb_dc2025 <- airbnb_2025|>
  select(-c(neighbourhood_group,license,host_id,host_name,id,last_review))
head(airbnb_dc2025)
# A tibble: 6 × 12
  name           neighbourhood latitude longitude room_type price minimum_nights
  <chr>          <chr>            <dbl>     <dbl> <chr>     <dbl>          <dbl>
1 Vita's Hideaw… Historic Ana…     38.9     -77.0 Private …    60             31
2 Historic Rowh… Edgewood, Bl…     38.9     -77.0 Private …    63              1
3 Capitol Hill … Capitol Hill…     38.9     -77.0 Private …   128              4
4 Bertina's  Ho… Eastland Gar…     38.9     -76.9 Private …    64             30
5 Cozy apt in A… Kalorama Hei…     38.9     -77.0 Entire h…    NA             50
6 Lovely guest … Spring Valle…     38.9     -77.1 Entire h…    74             31
# ℹ 5 more variables: number_of_reviews <dbl>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>,
#   number_of_reviews_ltm <dbl>

Visualize booking frequency by location.

Using leaflet package,allows for interactive mapping where listings and its details are seen by just mouse clicking.

pal_reviews2 <- colorNumeric(
  palette  = magma(7),
  domain   = airbnb_dc2025$number_of_reviews_ltm,
  na.color = "transparent"
)

leaflet(data = airbnb_dc2025) |>
  setView(lng = -77.0369, lat = 38.9072, zoom = 12) |>
  addTiles() |>   
  addCircles(
    lng         = ~longitude,
    lat         = ~latitude,
    radius      = ~sqrt(number_of_reviews_ltm)*2 + 2,
    color       = "#472988",
    fillColor   = ~pal_reviews2(number_of_reviews_ltm),
    fillOpacity = 0.7,
    popup       = paste0(
  "<b>Name:</b> ", airbnb_dc2025$name, "<br>",
  "<b>Neighborhood:</b> ", airbnb_dc2025$neighbourhood, "<br>",
  "<b>Price (USD):</b> $", airbnb_dc2025$price, "<br>",
  "<b>Reviews per Month:</b> ", round(airbnb_dc2025$reviews_per_month, 2), "<br>",
  "<b>Availability (days/year):</b> ", airbnb_dc2025$availability_365
)

  ) |>
  addLegend(
    position = "bottomright",
    pal      = pal_reviews2,
    values   = ~number_of_reviews_ltm,
    title    = "Reviews (Last 12 mo)",
    opacity  = 1
  )

Reflections on Visualizations

What the visualizations represent: Average Price by Neighborhood: Bar charts show which top and bottom DC neighborhoods command higher or lower nightly rates, highlighting Dupont Circle’s premium pricing.

Interactive Booking Frequency Map: A Leaflet map with circle markers sized and colored by reviews in the last 12 months flags hotspots of demand—particularly Dupont Circle, Capitol Hill, and Georgetown.

Interesting patterns or surprises:

Weak explanatory power of our regression (Adjusted R²≈1.2%) shows price is driven far more by unobserved factors—amenities, property size, interior quality, and exact location (e.g., proximity to Metro or monuments)—than by listing-level controls alone.I also noticed some outlier listings charge extraordinarily high rates (> $1,000/night), pulling up neighborhood averages and underscoring the long right‐tail in price distributions.

Limitations & Wish List

Expanded Use of Categorical Variables: Due to time constraints, I wasn’t able to fully incorporate all available categorical predictors—such as room_type, calculated_host_listings_count, or detailed neighborhood groupings—which could have added valuable nuance to both the regression and visualizations.

Incomplete Amenities Data: Airbnb’s public data did not include many listing amenities (e.g., “wifi”, “kitchen”, “pool”, “air conditioning”), yet these features often drive price variation. Having a complete amenities checklist would likely improve the accuracy of any price‐prediction model.

Bibliography

Airbnb. (2024, April 12). Our support for sensible, short-term rental policies for renters. Airbnb Newsroom. https://news.airbnb.com/our-support-for-sensible-short-term-rental-policies-for-renters/

OpenAI. (2023). ChatGPT (GPT-4) [Large language model]. https://chat.openai.com/