── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <-read_excel("Airbnb_DC_25.csv")view(df)
Creating new data frame that is grouped by room type:
hotel_rooms <- df %>%group_by(room_type) %>%summarize(#removing na values as that would make the entire value naavg_price =mean(price, na.rm =TRUE), avg_minimum_nights =mean(minimum_nights, na.rm =TRUE),avg_number_reviews =mean(number_of_reviews, na.rm =TRUE),.groups ="drop" )view(hotel_rooms)
Creating bar graph:
ggplot(hotel_rooms, aes(x = room_type, y = avg_price, fill=room_type)) +geom_bar(stat ="identity") +labs(x ="Room Type",y ="Average Price of Room", caption ="Dataset: Airbnb_DC_25", title ="Average Price of Each Room type for AirBNBs in DCs" )
Reflection Paragraph:
In this assignment, I wanted to compare the average price of room for different room types. Because the data was categorical, I ultimately decided to use a bar graph. I see that shared rooms are the most expensive, which is surprising because I would’ve expected the entire home/room to be more expensive. However, this might be due to an outlier that has drastically changed the data.