library(readxl)
df <- read_excel("C:/Users/14408/Downloads/Airbnb_DC_25.csv")
head(df)
## # A tibble: 6 × 18
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <lgl> <chr> <dbl>
## 1 3686 Vita's Hid… 4645 Vita NA Historic Ana… 38.9
## 2 3943 Historic R… 5059 Vasa NA Edgewood, Bl… 38.9
## 3 4197 Capitol Hi… 5061 Sandra NA Capitol Hill… 38.9
## 4 4529 Bertina's … 5803 Bertina NA Eastland Gar… 38.9
## 5 5589 Cozy apt i… 6527 Ami NA Kalorama Hei… 38.9
## 6 7103 Lovely gue… 17633 Charlotte NA Spring Valle… 38.9
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df_summary <- df %>%
group_by(neighbourhood) %>%
summarize(avg_price = mean(price, na.rm = TRUE)) %>%
slice_max(avg_price, n = 10)
df_summary
## # A tibble: 10 × 2
## neighbourhood avg_price
## <chr> <dbl>
## 1 Downtown, Chinatown, Penn Quarters, Mount Vernon Square, North Cap… 277.
## 2 West End, Foggy Bottom, GWU 264.
## 3 Howard University, Le Droit Park, Cardozo/Shaw 251.
## 4 Georgetown, Burleith/Hillandale 242.
## 5 Cathedral Heights, McLean Gardens, Glover Park 241.
## 6 Colonial Village, Shepherd Park, North Portal Estates 236.
## 7 Southwest Employment Area, Southwest/Waterfront, Fort McNair, Buzz… 229.
## 8 Hawthorne, Barnaby Woods, Chevy Chase 223.
## 9 Kalorama Heights, Adams Morgan, Lanier Heights 204.
## 10 Near Southeast, Navy Yard 190.
ggplot(df_summary, aes(x = reorder(neighbourhood, avg_price), y = avg_price)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
title = "Top 10 Most Expensive Airbnb Neighborhoods in Washington, DC",
x = "Neighborhood",
y = "Average Price ($)",
caption = "Source: Airbnb_DC_25 dataset"
) +
theme_minimal()
This visualization shows the top 10 most expensive Airbnb neighborhoods in Washington, DC based on average price. I used a bar graph to compare prices across different neighborhoods in the dataset. I applied the group_by() and summarize() functions to calculate the average price for each neighborhood, and then selected the top 10 highest values. One key pattern shown in the graph is that certain neighborhoods have significantly higher average prices than others. This suggests that location plays a major role in determining Airbnb pricing in Washington, DC.