Assignment 5: Airbnb Data

Author

Rin Hwang

#install.packages("readxl")

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

setwd("C:/Users/hwang/OneDrive/Documents/MC stuff/Spring 2026/DATA 110 Data Visualization and Communication/Assignments/Assignment 5")

df <- read_excel("Airbnb_DC_25.csv")
df
# A tibble: 6,257 × 18
      id name       host_id host_name neighbourhood_group neighbourhood latitude
   <dbl> <chr>        <dbl> <chr>     <lgl>               <chr>            <dbl>
 1  3686 Vita's Hi…    4645 Vita      NA                  Historic Ana…     38.9
 2  3943 Historic …    5059 Vasa      NA                  Edgewood, Bl…     38.9
 3  4197 Capitol H…    5061 Sandra    NA                  Capitol Hill…     38.9
 4  4529 Bertina's…    5803 Bertina   NA                  Eastland Gar…     38.9
 5  5589 Cozy apt …    6527 Ami       NA                  Kalorama Hei…     38.9
 6  7103 Lovely gu…   17633 Charlotte NA                  Spring Valle…     38.9
 7 11785 Sanctuary…   32015 Teresa    NA                  Cathedral He…     38.9
 8 12442 Peaches &…   32015 Teresa    NA                  Cathedral He…     38.9
 9 13744 Heart of …   53927 Victoria  NA                  Columbia Hei…     38.9
10 14218 Quiet Com…   32015 Teresa    NA                  Cathedral He…     38.9
# ℹ 6,247 more rows
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
colnames(df)
 [1] "id"                             "name"                          
 [3] "host_id"                        "host_name"                     
 [5] "neighbourhood_group"            "neighbourhood"                 
 [7] "latitude"                       "longitude"                     
 [9] "room_type"                      "price"                         
[11] "minimum_nights"                 "number_of_reviews"             
[13] "last_review"                    "reviews_per_month"             
[15] "calculated_host_listings_count" "availability_365"              
[17] "number_of_reviews_ltm"          "license"                       
df %>%
  count(neighbourhood, sort = TRUE)
# A tibble: 39 × 2
   neighbourhood                                                               n
   <chr>                                                                   <int>
 1 Union Station, Stanton Park, Kingman Park                                 687
 2 Capitol Hill, Lincoln Park                                                587
 3 Dupont Circle, Connecticut Avenue/K Street                                448
 4 Columbia Heights, Mt. Pleasant, Pleasant Plains, Park View                436
 5 Edgewood, Bloomingdale, Truxton Circle, Eckington                         409
 6 Brightwood Park, Crestwood, Petworth                                      368
 7 Shaw, Logan Circle                                                        361
 8 Downtown, Chinatown, Penn Quarters, Mount Vernon Square, North Capitol…   324
 9 Ivy City, Arboretum, Trinidad, Carver Langston                            273
10 Howard University, Le Droit Park, Cardozo/Shaw                            214
# ℹ 29 more rows
df %>%
  count(room_type, sort = TRUE)
# A tibble: 4 × 2
  room_type           n
  <chr>           <int>
1 Entire home/apt  4863
2 Private room     1305
3 Hotel room         74
4 Shared room        15
bar_plot_df <- df %>%
  filter(neighbourhood %in% c("Brightwood Park, Crestwood, Petworth", "Dupont Circle, Connecticut Avenue/K Street", "Near Southeast, Navy Yard")) %>%
  filter(price > 0 & price < 800) %>%
  select(price, neighbourhood, room_type) %>%
group_by(neighbourhood, room_type) %>%
summarize(avg_price = mean(price, na.rm = TRUE), .groups = 'drop')
bar_plot_df
# A tibble: 8 × 3
  neighbourhood                              room_type       avg_price
  <chr>                                      <chr>               <dbl>
1 Brightwood Park, Crestwood, Petworth       Entire home/apt     136. 
2 Brightwood Park, Crestwood, Petworth       Private room         64.4
3 Dupont Circle, Connecticut Avenue/K Street Entire home/apt     184. 
4 Dupont Circle, Connecticut Avenue/K Street Hotel room          401. 
5 Dupont Circle, Connecticut Avenue/K Street Private room        150. 
6 Near Southeast, Navy Yard                  Entire home/apt     203. 
7 Near Southeast, Navy Yard                  Hotel room           28  
8 Near Southeast, Navy Yard                  Private room        139. 
ggplot(bar_plot_df, aes(x = neighbourhood, y = avg_price, fill = room_type)) +
  geom_col(position = "dodge", color = "white") + 
  scale_x_discrete(labels = scales::label_wrap(15)) + 
  labs(
    title = "Average Airbnb Price in Three D.C. Neighborhoods",
    x = "Neighborhood",
    y = "Average Price per Night ($USD)",
    fill = "Room Type", 
    caption = "Data Source: Airbnb_DC_25.csv"
  ) +
theme_minimal() +
  theme(
    plot.caption = element_text(hjust = 0.5, size = 10),
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 9),
    plot.margin = margin(1, 1, 1, 1, "cm") 
  )

The visualization created is a grouped bar chart that displays the average nightly price for various room types across three specific neighborhood clusters in Washington D.C. These neighborhoods are “Brightwood Park, Crestwood, Petworth” (Northeast D.C.), “Dupont Circle, Connecticut Avenue/K Street” (Northwest D.C.), and ““ear Southeast, Navy Yard” (Southeast D.C.). A key insight observed in the plot is that the “Dupont Circle, Connecticut Avenue/K Street” neighborhood has the highest average price for entire homes, significantly overpassing the averages in the other two areas. It is also apparent that Dupont Circle and Navy Yard neighborhoods contain “Hotel room” listings, identifying them as city dense and tourism heavy areas. In contrast, the Brookland neighborhood is more of a residential zone with less hotels and more private homes and rooms.