Assignment 5

Creating a Data Visualization

Import Statements

library(readxl)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- read_excel("Airbnb_DC_25.csv")
view(df)

Creating new data frame that is grouped by room type:

hotel_rooms <- df %>%
  group_by(room_type) %>% 
  summarize(
    #removing na values as that would make the entire value na
    avg_price = mean(price, na.rm = TRUE), 
    avg_minimum_nights = mean(minimum_nights, na.rm = TRUE),
    avg_number_reviews = mean(number_of_reviews, na.rm = TRUE),
    .groups = "drop"
    
  )
view(hotel_rooms)

Creating bar graph:

  ggplot(hotel_rooms, aes(x = room_type, y = avg_price, fill=room_type)) + 
    geom_bar(stat = "identity") + 
  labs(
    x = "Room Type",
    y = "Average Price of Room", 
    caption = "Dataset: Airbnb_DC_25", 
    title = "Average Price of Each Room type for AirBNBs in DCs"
  )

Reflection Paragraph:

In this assignment, I wanted to compare the average price of room for different room types. Because the data was categorical, I ultimately decided to use a bar graph. I see that shared rooms are the most expensive, which is surprising because I would’ve expected the entire home/room to be more expensive. However, this might be due to an outlier that has drastically changed the data.