Assignment 3

library(readxl)

df <- read_excel("C:/Users/14408/Downloads/Airbnb_DC_25.csv")

head(df)

## # A tibble: 6 × 18
##      id name        host_id host_name neighbourhood_group neighbourhood latitude
##   <dbl> <chr>         <dbl> <chr>     <lgl>               <chr>            <dbl>
## 1  3686 Vita's Hid…    4645 Vita      NA                  Historic Ana…     38.9
## 2  3943 Historic R…    5059 Vasa      NA                  Edgewood, Bl…     38.9
## 3  4197 Capitol Hi…    5061 Sandra    NA                  Capitol Hill…     38.9
## 4  4529 Bertina's …    5803 Bertina   NA                  Eastland Gar…     38.9
## 5  5589 Cozy apt i…    6527 Ami       NA                  Kalorama Hei…     38.9
## 6  7103 Lovely gue…   17633 Charlotte NA                  Spring Valle…     38.9
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## #   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## #   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## #   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

df_summary <- df %>%
  group_by(neighbourhood) %>%
  summarize(avg_price = mean(price, na.rm = TRUE)) %>%
  slice_max(avg_price, n = 10)

df_summary

## # A tibble: 10 × 2
##    neighbourhood                                                       avg_price
##    <chr>                                                                   <dbl>
##  1 Downtown, Chinatown, Penn Quarters, Mount Vernon Square, North Cap…      277.
##  2 West End, Foggy Bottom, GWU                                              264.
##  3 Howard University, Le Droit Park, Cardozo/Shaw                           251.
##  4 Georgetown, Burleith/Hillandale                                          242.
##  5 Cathedral Heights, McLean Gardens, Glover Park                           241.
##  6 Colonial Village, Shepherd Park, North Portal Estates                    236.
##  7 Southwest Employment Area, Southwest/Waterfront, Fort McNair, Buzz…      229.
##  8 Hawthorne, Barnaby Woods, Chevy Chase                                    223.
##  9 Kalorama Heights, Adams Morgan, Lanier Heights                           204.
## 10 Near Southeast, Navy Yard                                                190.

ggplot(df_summary, aes(x = reorder(neighbourhood, avg_price), y = avg_price)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Most Expensive Airbnb Neighborhoods in Washington, DC",
    x = "Neighborhood",
    y = "Average Price ($)",
    caption = "Source: Airbnb_DC_25 dataset"
  ) +
  theme_minimal()

This visualization shows the top 10 most expensive Airbnb neighborhoods in Washington, DC based on average price. I used a bar graph to compare prices across different neighborhoods in the dataset. I applied the group_by() and summarize() functions to calculate the average price for each neighborhood, and then selected the top 10 highest values. One key pattern shown in the graph is that certain neighborhoods have significantly higher average prices than others. This suggests that location plays a major role in determining Airbnb pricing in Washington, DC.

Assignment 3

Jean Tcheby