library(readxl)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
df <- read_excel("Airbnb_DC_25.csv")
df
## # A tibble: 6,257 × 18
##       id name       host_id host_name neighbourhood_group neighbourhood latitude
##    <dbl> <chr>        <dbl> <chr>     <lgl>               <chr>            <dbl>
##  1  3686 Vita's Hi…    4645 Vita      NA                  Historic Ana…     38.9
##  2  3943 Historic …    5059 Vasa      NA                  Edgewood, Bl…     38.9
##  3  4197 Capitol H…    5061 Sandra    NA                  Capitol Hill…     38.9
##  4  4529 Bertina's…    5803 Bertina   NA                  Eastland Gar…     38.9
##  5  5589 Cozy apt …    6527 Ami       NA                  Kalorama Hei…     38.9
##  6  7103 Lovely gu…   17633 Charlotte NA                  Spring Valle…     38.9
##  7 11785 Sanctuary…   32015 Teresa    NA                  Cathedral He…     38.9
##  8 12442 Peaches &…   32015 Teresa    NA                  Cathedral He…     38.9
##  9 13744 Heart of …   53927 Victoria  NA                  Columbia Hei…     38.9
## 10 14218 Quiet Com…   32015 Teresa    NA                  Cathedral He…     38.9
## # ℹ 6,247 more rows
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## #   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## #   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## #   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
  1. Include at least one dplyr command, such as:

    filter(), arrange(), summarize(), group_by(), select(), or mutate().

  2. Include labels for both the x-axis and y-axis.

  3. Include a clear title and a caption indicating the data source.

  4. Use at least two colors in the plot.

  5. Include a legend explaining what the colors represent.

  6. Write a short paragraph (3–5 sentences) describing:

the visualization you created, and one key insight or pattern you observe in the plot.

df |>
  group_by(room_type) |>
    summarise(avg_price = mean(price, na.rm = TRUE)) |>
     ggplot(aes(x = room_type, y = avg_price, fill = room_type)) +
      geom_col() +
        labs(
          title = "Average Airbnb Price by Room Type",
          x = "Room Type",
          y = "Average Price ($)",
          caption = "Source: Airbnb_DC_25.csv"
        ) +
          scale_fill_manual(values = c(
          "Private room" = "red",
          "Entire home/apt" = "orange",
          "Hotel room" = "yellow",
          "Shared room" = "green"
        ))

Answer: The graph is a bar plot visualizing average airbnb prices by room type. “avg_price” is a local variable I created by first grouping the prices by room type, then calculating the mean price. I decided to use a bar plot because the variables I wanted to use the unique room types in the dataset and price. These are categorical and numerical variables, and a bar plot is appropriate for those variable types. I found it interesting that the shared room type is (on average) the most expensive- I would expect that hotel rooms are the most expensive, as they tend to have additional fees.