Loading Libraries & Dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(ggplot2)
 
setwd("~/Documents/EC/Spring 2026/DATA 110")
 
airbnb <- read_excel("Airbnb_DC_25.csv")

head(airbnb)
## # A tibble: 6 × 18
##      id name        host_id host_name neighbourhood_group neighbourhood latitude
##   <dbl> <chr>         <dbl> <chr>     <lgl>               <chr>            <dbl>
## 1  3686 Vita's Hid…    4645 Vita      NA                  Historic Ana…     38.9
## 2  3943 Historic R…    5059 Vasa      NA                  Edgewood, Bl…     38.9
## 3  4197 Capitol Hi…    5061 Sandra    NA                  Capitol Hill…     38.9
## 4  4529 Bertina's …    5803 Bertina   NA                  Eastland Gar…     38.9
## 5  5589 Cozy apt i…    6527 Ami       NA                  Kalorama Hei…     38.9
## 6  7103 Lovely gue…   17633 Charlotte NA                  Spring Valle…     38.9
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## #   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## #   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## #   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>

Focusing On Two Variables: room_type & price

Found the mean of each room type by summarising and then removing the na values

airbnb_avg <- airbnb |>
  group_by(room_type) |>
  summarise(avg_price = mean(price, na.rm=TRUE)) |>
  arrange(desc(avg_price))
airbnb_avg
## # A tibble: 4 × 2
##   room_type       avg_price
##   <chr>               <dbl>
## 1 Shared room         1455.
## 2 Hotel room           346.
## 3 Entire home/apt      181.
## 4 Private room         103.

Bar Graph of Average Price by Room Type

ggplot(airbnb_avg, aes(x = room_type, y = avg_price, color = room_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Price by Room Type",
    x = "Room Type",
    y = "Average Price",
    caption = "Dataset: Airbnb_DC_25.csv")

Write a short paragraph (3–5 sentences):

I created a box plot of the variables room_type and avg_price. I found the average price of each room type by finding the mean of each room type and removing the NA values. In this graph, I see that the shared room airbnbs had the highest average cost by far which was suprising to me. I would have expected entire homes to have the highest average cost, then shared rooms, hotels rooms, and private rooms. But I believe this is due to the data set not having many shared rooms and two of the shared room prices are at 7000 dollars which explains for the high average it holds.