library(readxl)
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- read_excel("Airbnb_DC_25.csv")
room_price <- df |>
group_by(room_type) |>
summarize(avg_price = mean(price, na.rm = TRUE))
head(room_price)
## # A tibble: 4 × 2
## room_type avg_price
## <chr> <dbl>
## 1 Entire home/apt 181.
## 2 Hotel room 346.
## 3 Private room 103.
## 4 Shared room 1455.
ggplot(room_price, aes(x = room_type, y = avg_price, fill = room_type)) +
geom_bar(stat = "identity") +
labs(
title = "Average Airbnb Price by Room Type in Washington DC",
x = "Room Type",
y = "Average Price (USD)",
caption = "Data Source: Airbnb_DC_25 Dataset"
) +
theme_minimal()
# Reflection
This visualization that I created is a bar graph showing the average Airbnb prices by room type in Washington DC. I used “fill =” to create different colors and the legend, “stat=identity” to use the actual values of the dataset. Each bar represents a different room type, and the colors help determine which room type is which. The x-axis shows the room type and the y-axis shows the average price of those room types. One key insight about this plot is that shared rooms have the highest average price, while private rooms have the lowest average price. I think this happened because in Washington D.C, entire homes and private rooms are very rare and there are not a lot of them in the city, which means that Airbnb has more shared rooms than private rooms on their app, which makes the average price so low.