Assignment 6

Author

R Josue

library(readxl)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- read_excel("Airbnb_DC_25.csv")
df
# A tibble: 6,257 × 18
      id name       host_id host_name neighbourhood_group neighbourhood latitude
   <dbl> <chr>        <dbl> <chr>     <lgl>               <chr>            <dbl>
 1  3686 Vita's Hi…    4645 Vita      NA                  Historic Ana…     38.9
 2  3943 Historic …    5059 Vasa      NA                  Edgewood, Bl…     38.9
 3  4197 Capitol H…    5061 Sandra    NA                  Capitol Hill…     38.9
 4  4529 Bertina's…    5803 Bertina   NA                  Eastland Gar…     38.9
 5  5589 Cozy apt …    6527 Ami       NA                  Kalorama Hei…     38.9
 6  7103 Lovely gu…   17633 Charlotte NA                  Spring Valle…     38.9
 7 11785 Sanctuary…   32015 Teresa    NA                  Cathedral He…     38.9
 8 12442 Peaches &…   32015 Teresa    NA                  Cathedral He…     38.9
 9 13744 Heart of …   53927 Victoria  NA                  Columbia Hei…     38.9
10 14218 Quiet Com…   32015 Teresa    NA                  Cathedral He…     38.9
# ℹ 6,247 more rows
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
room_summary <- df %>% group_by(room_type) %>% summarize(median_price = median(price, na.rm = TRUE))

room_summary
# A tibble: 4 × 2
  room_type       median_price
  <chr>                  <dbl>
1 Entire home/apt          146
2 Hotel room               292
3 Private room              68
4 Shared room               72
ggplot(room_summary,aes(x = room_type,y = median_price,fill = room_type)) +
geom_col() +labs(title = "Median Price by the type of room in DC",
x = "Type of Room",y = "Median Nightly Price",fill = "Type of Room",caption = "Source: Airbnb_DC_25.csv") +
theme_minimal()

This graph depicts the different types of rooms in DC and their corresponding prices. The focus here, is to make an emphasis on the median prices of types of rooms. I had to use the median because the average made the shared rooms more expensive and that did not make sense. This graph makes more sense than the previous one I had and it is better. A pattern from the graph is that the hotel rooms and entire home/ apt are more expensive.