Assignment 5

Author

Arinze Ugbah

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

library(readxl)
df <- read_excel("Airbnb_DC_25.csv")
df
# A tibble: 6,257 × 18
      id name       host_id host_name neighbourhood_group neighbourhood latitude
   <dbl> <chr>        <dbl> <chr>     <lgl>               <chr>            <dbl>
 1  3686 Vita's Hi…    4645 Vita      NA                  Historic Ana…     38.9
 2  3943 Historic …    5059 Vasa      NA                  Edgewood, Bl…     38.9
 3  4197 Capitol H…    5061 Sandra    NA                  Capitol Hill…     38.9
 4  4529 Bertina's…    5803 Bertina   NA                  Eastland Gar…     38.9
 5  5589 Cozy apt …    6527 Ami       NA                  Kalorama Hei…     38.9
 6  7103 Lovely gu…   17633 Charlotte NA                  Spring Valle…     38.9
 7 11785 Sanctuary…   32015 Teresa    NA                  Cathedral He…     38.9
 8 12442 Peaches &…   32015 Teresa    NA                  Cathedral He…     38.9
 9 13744 Heart of …   53927 Victoria  NA                  Columbia Hei…     38.9
10 14218 Quiet Com…   32015 Teresa    NA                  Cathedral He…     38.9
# ℹ 6,247 more rows
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# 1. I'm Filtering out extreme price outliers for a clearer plot
plot_data <- df %>%
  filter(price > 0 & price < 1000) %>% 
  select(price, number_of_reviews, room_type)

# 2. With this Chunk I'm creating a scatterplot
ggplot(plot_data, aes(x = number_of_reviews, y = price, color = room_type)) +
  geom_point(alpha = 0.6) +
  labs(
    title = "Airbnb : Price vs. Reviews",
    x = "Number of Reviews",
    y = "Price per Night ($)",
    caption = "Source: Airbnb_DC_25.csv dataset",
    color = "Accommodation Type"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Set1") # Ensures at least two distinct colors

In this Visualization my goal was to find the relationship between the price of an Airbnb and the total number of reviews based on the type of room. Using the filter command i removed the extreme or outlier prices, so the results would be less skewed. Looking at the visialization, we can see that more people book Airbnb’s based on lower to mid range prices.