Downloading libraries and Dataset

library(readxl)
## Warning: package 'readxl' was built under R version 4.5.2
df <- read_excel("Airbnb_DC_25.csv")
df  
## # A tibble: 6,257 × 18
##       id name       host_id host_name neighbourhood_group neighbourhood latitude
##    <dbl> <chr>        <dbl> <chr>     <lgl>               <chr>            <dbl>
##  1  3686 Vita's Hi…    4645 Vita      NA                  Historic Ana…     38.9
##  2  3943 Historic …    5059 Vasa      NA                  Edgewood, Bl…     38.9
##  3  4197 Capitol H…    5061 Sandra    NA                  Capitol Hill…     38.9
##  4  4529 Bertina's…    5803 Bertina   NA                  Eastland Gar…     38.9
##  5  5589 Cozy apt …    6527 Ami       NA                  Kalorama Hei…     38.9
##  6  7103 Lovely gu…   17633 Charlotte NA                  Spring Valle…     38.9
##  7 11785 Sanctuary…   32015 Teresa    NA                  Cathedral He…     38.9
##  8 12442 Peaches &…   32015 Teresa    NA                  Cathedral He…     38.9
##  9 13744 Heart of …   53927 Victoria  NA                  Columbia Hei…     38.9
## 10 14218 Quiet Com…   32015 Teresa    NA                  Cathedral He…     38.9
## # ℹ 6,247 more rows
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## #   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## #   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## #   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2

Removing NAs

df <- df |>
  filter(!is.na(price) & price > 0)

I removed rows where prices are missing.

Filtering out Hotel room

df1 <- df |>
  filter(room_type %in% c("Private room", "Entire home/apt", "Shared room"))
head(df1)
## # A tibble: 6 × 18
##      id name        host_id host_name neighbourhood_group neighbourhood latitude
##   <dbl> <chr>         <dbl> <chr>     <lgl>               <chr>            <dbl>
## 1  3686 Vita's Hid…    4645 Vita      NA                  Historic Ana…     38.9
## 2  3943 Historic R…    5059 Vasa      NA                  Edgewood, Bl…     38.9
## 3  4197 Capitol Hi…    5061 Sandra    NA                  Capitol Hill…     38.9
## 4  4529 Bertina's …    5803 Bertina   NA                  Eastland Gar…     38.9
## 5  7103 Lovely gue…   17633 Charlotte NA                  Spring Valle…     38.9
## 6 11785 Sanctuary …   32015 Teresa    NA                  Cathedral He…     38.9
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## #   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## #   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## #   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>

Boxplot

ggplot(df1, aes(x = room_type, y = price, fill = room_type)) +
  geom_boxplot() +
  labs( title = "Distribution of Airbnb Prices by Room Type in DC",
        x = "Room Type", 
        y = "Price",
        fill = "Room Type", 
        caption = "Data Source: Airbnb_DC_25.xlsx") +
        scale_fill_manual(values = c("Private room" = "skyblue", "Entire home/apt" =  "orange", "Shared room" = "lightgreen")) +
        theme_minimal()

Extreme outliers, making it look compressed.

Fixed Boxplot

ggplot(df1, aes(x = room_type, y = price, fill = room_type)) +
  geom_boxplot() +
  labs( title = "Distribution of Airbnb Prices by Room Type in DC",
        x = "Room Type", 
        y = "Price",
        fill = "Room Type", 
        caption = "Data Source: Airbnb_DC_25.xlsx") +
        scale_fill_manual(values = c("Private room" = "skyblue", "Entire home/apt" =  "orange", "Shared room" = "lightgreen")) +
  scale_y_log10() +
  theme_minimal()

This plot shows the prices of Airbnb listings in Washington DC by room type: Entire home/apt, Shared rooms, or private room. I used a boxplot to show the price distribution for each type. To make the prices easier to compare, I used a log10 on the y-axis. This means the numbers increase by multiples of 10 (like 10, 100, 1000) instead of counting one by one. Using the log scales helps shows both low and high prices cleaarly, eventhough some listings have very expensive prices. We can see that the median price for entire houses or apartments is around $150, for private rooms is $75, and for shared rooms is about $80 (the median line is a bit higher than for the private rooms). However the boxplot still shows that there are outliers, specially for the entire homes/apt and the private rooms.