library(readxl)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
df <- read_excel("Airbnb_DC_25.csv")
df
## # A tibble: 6,257 × 18
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <lgl> <chr> <dbl>
## 1 3686 Vita's Hi… 4645 Vita NA Historic Ana… 38.9
## 2 3943 Historic … 5059 Vasa NA Edgewood, Bl… 38.9
## 3 4197 Capitol H… 5061 Sandra NA Capitol Hill… 38.9
## 4 4529 Bertina's… 5803 Bertina NA Eastland Gar… 38.9
## 5 5589 Cozy apt … 6527 Ami NA Kalorama Hei… 38.9
## 6 7103 Lovely gu… 17633 Charlotte NA Spring Valle… 38.9
## 7 11785 Sanctuary… 32015 Teresa NA Cathedral He… 38.9
## 8 12442 Peaches &… 32015 Teresa NA Cathedral He… 38.9
## 9 13744 Heart of … 53927 Victoria NA Columbia Hei… 38.9
## 10 14218 Quiet Com… 32015 Teresa NA Cathedral He… 38.9
## # ℹ 6,247 more rows
## # ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
Include at least one dplyr command, such as:
filter(), arrange(), summarize(), group_by(), select(), or mutate().
Include labels for both the x-axis and y-axis.
Include a clear title and a caption indicating the data source.
Use at least two colors in the plot.
Include a legend explaining what the colors represent.
Write a short paragraph (3–5 sentences) describing:
the visualization you created, and one key insight or pattern you observe in the plot.
df |>
group_by(room_type) |>
summarise(avg_price = mean(price, na.rm = TRUE)) |>
ggplot(aes(x = room_type, y = avg_price, fill = room_type)) +
geom_col() +
labs(
title = "Average Airbnb Price by Room Type",
x = "Room Type",
y = "Average Price ($)",
caption = "Source: Airbnb_DC_25.csv"
) +
scale_fill_manual(values = c(
"Private room" = "red",
"Entire home/apt" = "orange",
"Hotel room" = "yellow",
"Shared room" = "green"
))
Answer: The graph is a bar plot visualizing average airbnb prices by
room type. “avg_price” is a local variable I created by first grouping
the prices by room type, then calculating the mean price. I decided to
use a bar plot because the variables I wanted to use the unique room
types in the dataset and price. These are categorical and numerical
variables, and a bar plot is appropriate for those variable types. I
found it interesting that the shared room type is (on average) the most
expensive- I would expect that hotel rooms are the most expensive, as
they tend to have additional fees.